ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,486 papers shown
CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models
CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models
Zhuxuanzi Wang
Mingqiao Mo
Xi Xiao
Chen Liu
Chenrui Ma
Yunbei Zhang
Xiao Wang
Smita Krishnaswamy
Tianyang Wang
136
0
0
11 Oct 2025
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
Guanbin Li
Miao Yu
Moayad Aloqaily
Zhenhong Zhou
Kun Wang
Linsey Pang
Prakhar Mehrotra
Qingsong Wen
AAML
81
0
0
11 Oct 2025
ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
Shivam Patel
Neharika Jali
Ankur Mallick
Gauri Joshi
147
2
0
10 Oct 2025
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
Hyundong Jin
Joonghyuk Hahn
Yo-Sub Han
LRM
92
0
0
10 Oct 2025
Hierarchical Scheduling for Multi-Vector Image Retrieval
Hierarchical Scheduling for Multi-Vector Image Retrieval
Maoliang Li
K. Li
Yaoyang Liu
Jiayu Chen
Zihao Zheng
Yinjun Wu
Xiang Chen
127
2
0
10 Oct 2025
Don't Throw Away Your Pretrained Model
Don't Throw Away Your Pretrained Model
Shangbin Feng
Wenhao Yu
Yike Wang
Hongming Zhang
Yulia Tsvetkov
Dong Yu
MoMe
230
2
0
10 Oct 2025
NarraBench: A Comprehensive Framework for Narrative Benchmarking
NarraBench: A Comprehensive Framework for Narrative Benchmarking
Sil Hamilton
Matthew Wilkens
Andrew Piper
205
1
0
10 Oct 2025
StatEval: A Comprehensive Benchmark for Large Language Models in Statistics
StatEval: A Comprehensive Benchmark for Large Language Models in Statistics
Yuchen Lu
Run Yang
Yichen Zhang
Shuguang Yu
Runpeng Dai
...
X. Wei
Jiani Gu
Rui Sun
Jiaxuan Jia
Fan Zhou
OffRLALMELMLRM
216
1
0
10 Oct 2025
Understanding the Effects of Domain Finetuning on LLMs
Understanding the Effects of Domain Finetuning on LLMs
Eshaan Tanwar
Deepak Nathani
William Yang Wang
Tanmoy Chakraborty
136
0
0
10 Oct 2025
InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
Qiaosheng Chen
Y. Liu
Lei Li
Kai Chen
Q. Guo
Gong Cheng
Fei Yuan
ELM
165
1
0
10 Oct 2025
Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors
Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors
Yihong Liu
Raoyuan Zhao
Lena Altinger
Hinrich Schutze
Michael A. Hedderich
AAML
140
2
0
10 Oct 2025
LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data
LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data
Changsheng Wang
Yihua Zhang
Dennis L. Wei
Jinghan Jia
Pin-Yu Chen
Sijia Liu
MU
181
0
0
10 Oct 2025
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
Junyan Ye
Dongzhi Jiang
Jun-Jian He
Baichuan Zhou
Zilong Huang
Zhiyuan Yan
Jiaming Song
Conghui He
Weijia Li
ReLMVLMLRM
127
2
0
10 Oct 2025
A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages
A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages
Raoyuan Zhao
Yihong Liu
Hinrich Schutze
Michael A. Hedderich
LRM
74
2
0
10 Oct 2025
A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space
A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space
Bibekananda Patra
Aditya Mahesh Kolte
Sandipan Bandyopadhyay
125
12
0
10 Oct 2025
Users as Annotators: LLM Preference Learning from Comparison Mode
Users as Annotators: LLM Preference Learning from Comparison Mode
Zhongze Cai
Xiaocheng Li
95
0
0
10 Oct 2025
Attention to Non-Adopters
Attention to Non-Adopters
Kaitlyn Zhou
Kristina Gligorić
Myra Cheng
Michelle S. Lam
Vyoma Raman
Boluwatife Aminu
Caeley Woo
Michael Brockman
Hannah Cha
Dan Jurafsky
102
1
0
10 Oct 2025
MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics
MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics
Jiapeng Wang
Changxin Tian
Kunlong Chen
Ziqi Liu
Jiaxin Mao
Wayne Xin Zhao
Zhiqiang Zhang
Jun Zhou
114
1
0
10 Oct 2025
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Xiangyuan Xue
Yifan Zhou
G. Zhang
Zaibin Zhang
Y. Li
Chen Zhang
Z. Yin
Philip Torr
Wanli Ouyang
Lei Bai
LLMAG
142
3
0
09 Oct 2025
Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training
Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training
Ruizhe Wang
Yucheng Ding
Xiao Liu
Yaoxiang Wang
Peng Cheng
Baining Guo
Zhengjun Zha
Yeyun Gong
145
0
0
09 Oct 2025
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
Jingyuan Wang
Yankai Chen
Zhonghang Li
Chao Huang
LRM
102
0
0
09 Oct 2025
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Qiaozhe Zhang
Jun Sun
Ruijie Zhang
Yingzhuang Liu
200
0
0
09 Oct 2025
Lossless Vocabulary Reduction for Auto-Regressive Language Models
Lossless Vocabulary Reduction for Auto-Regressive Language Models
Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Shinýa Yamaguchi
Tomoya Ohba
Tamao Sakao
Susumu Takeuchi
104
1
0
09 Oct 2025
Guided Star-Shaped Masked Diffusion
Guided Star-Shaped Masked Diffusion
Viacheslav Meshchaninov
Egor Shibaev
Artem Makoian
Ivan Klimov
Danil Sheshenya
A. Malinin
Nikita Balagansky
Daniil Gavrilov
Aibek Alanov
Dmitry Vetrov
DiffM
172
1
0
09 Oct 2025
GCPO: When Contrast Fails, Go Gold
GCPO: When Contrast Fails, Go Gold
Hao Wu
Wei Liu
124
1
0
09 Oct 2025
FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling
FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling
Zhengyu Wu
Yinlin Zhu
Xunkai Li
Ziang Qiu
Rong-Hua Li
Guoren Wang
Chenghu Zhou
FedML
149
0
0
09 Oct 2025
FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning
FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning
Yunbo Li
Jiaping Gui
Zhihang Deng
Fanchao Meng
Yue Wu
FedML
353
5
0
09 Oct 2025
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
Andong Deng
Taojiannan Yang
S. Yu
Lincoln Spencer
Mohit Bansal
Chen Chen
Serena Yeung-Levy
Xiaohan Wang
LRM
135
3
0
09 Oct 2025
ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
Qin Liu
Jacob Dineen
Y. Huang
Sheng Zhang
Hoifung Poon
Ben Zhou
Muhao Chen
ELM
147
0
0
09 Oct 2025
Contrastive Weak-to-strong Generalization
Contrastive Weak-to-strong Generalization
Houcheng Jiang
Junfeng Fang
Jiaxin Wu
T. Zhang
Chen Gao
Yong Li
X. Wang
Xiangnan He
Yang Deng
141
0
0
09 Oct 2025
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
Jingyu Zhang
Haozhu Wang
Eric Michael Smith
Sid Wang
Amr Sharaf
Mahesh Pasupuleti
Benjamin Van Durme
Daniel Khashabi
Jason Weston
Hongyuan Zhan
134
1
0
09 Oct 2025
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Kazuki Egashira
Robin Staab
Thibaud Gloaguen
Mark Vero
Martin Vechev
AAML
217
1
0
09 Oct 2025
How Reliable is Language Model Micro-Benchmarking?
How Reliable is Language Model Micro-Benchmarking?
Gregory Yauney
Shahzaib Saqib Warraich
Swabha Swayamdipta
ALM
152
0
0
09 Oct 2025
Energy-Driven Steering: Reducing False Refusals in Large Language Models
Energy-Driven Steering: Reducing False Refusals in Large Language Models
Eric Hanchen Jiang
Weixuan Ou
Run Liu
Shengyuan Pang
Guancheng Wan
...
Wei Dong
Kai-Wei Chang
Xiaofeng Wang
Ying Nian Wu
Xinfeng Li
LLMSV
245
0
0
09 Oct 2025
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
Alexander Rubinstein
Benjamin Raible
Martin Gubri
Seong Joon Oh
ELM
401
0
1
09 Oct 2025
Benchmarking is Broken -- Don't Let AI be its Own Judge
Benchmarking is Broken -- Don't Let AI be its Own Judge
Zerui Cheng
Stella Wohnig
Ruchika Gupta
Samiul Alam
Tassallah Abdullahi
...
Daniel Kirste
Aaron Gokaslan
Mikołaj Glinka
Carsten Eickhoff
Ruben Wolff
ELM
168
1
0
08 Oct 2025
JAI-1: A Thai-Centric Large Language Model
JAI-1: A Thai-Centric Large Language Model
Attapol T. Rutherford
Jullajak Karnjanaekarin
Narongkorn Panitsrisit
Pontakorn Trakuekul
Sumana Sumanakul
Natchanon Pollertlam
83
0
0
08 Oct 2025
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
Yeskendir Koishekenov
Aldo Lipani
Nicola Cancedda
LRM
154
2
0
08 Oct 2025
Native Hybrid Attention for Efficient Sequence Modeling
Native Hybrid Attention for Efficient Sequence Modeling
Jusen Du
Jiaxi Hu
Tao Zhang
Weigao Sun
Yu Cheng
215
3
0
08 Oct 2025
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs
Wang Wei
Tiankai Yang
Hongjie Chen
Yue Zhao
Franck Dernoncourt
Ryan Rossi
Hoda Eldardiry
OffRL
92
2
0
08 Oct 2025
Online Rubrics Elicitation from Pairwise Comparisons
Online Rubrics Elicitation from Pairwise Comparisons
MohammadHossein Rezaei
Robert Vacareanu
Zihao Wang
Clinton Wang
Bing Liu
Yunzhong He
Afra Feyza Akyürek
OffRL
199
2
0
08 Oct 2025
Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models
Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models
Benjamin Akera
Evelyn Nafula Ouma
Gilbert Yiga
Patrick Walukagga
Phionah Natukunda
...
Imran Sekalala
Nimpamya Janat Namara
Engineer Bainomugisha
Ernest Mwebaze
John Quinn
195
0
0
08 Oct 2025
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Auto-Stega: An Agent-Driven System for Lifelong Strategy Evolution in LLM-Based Text Steganography
Jiuan Zhou
Yu Cheng
Yuan Xie
Z. Yin
128
4
0
08 Oct 2025
Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer
Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer
Jing-Zong Zhang
Shuang Guo
Li-Lin Zhu
Lingxiao Wang
Guo-Liang Ma
163
10
0
08 Oct 2025
Pragyaan: Designing and Curating High-Quality Cultural Post-Training Datasets for Indian Languages
Pragyaan: Designing and Curating High-Quality Cultural Post-Training Datasets for Indian Languages
Neel Prabhanjan Rachamalla
Aravind Konakalla
Gautam Rajeev
Ashish Kulkarni
Chandra Khatri
Shubham Agarwal
151
1
0
08 Oct 2025
Grouped Differential Attention
Grouped Differential Attention
Junghwan Lim
S. W. Lee
Dongseok Kim
Wai Ting Cheung
Beomgyu Kim
Taehwan Kim
Haesol Lee
Junhyeok Lee
Dongpin Oh
Eunhwan Park
106
1
0
08 Oct 2025
LLM Unlearning Under the Microscope: A Full-Stack View on Methods and Metrics
LLM Unlearning Under the Microscope: A Full-Stack View on Methods and Metrics
Chongyu Fan
Changsheng Wang
Yancheng Huang
Soumyadeep Pal
Sijia Liu
MUELM
196
0
0
08 Oct 2025
Inefficiencies of Meta Agents for Agent Design
Inefficiencies of Meta Agents for Agent Design
Batu El
Mert Yuksekgonul
J. Zou
AIFin
118
0
0
08 Oct 2025
PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch
PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch
Shangjian Yin
Shining Liang
Wenbiao Ding
Yuli Qian
Zhouxing Shi
Hongzhi Li
Yutao Xie
ALM
119
0
0
08 Oct 2025
A multi-layered embedded intrusion detection framework for programmable logic controllers
A multi-layered embedded intrusion detection framework for programmable logic controllers
Rishabh Das. Aaron Werth
Tommy Morris
112
0
0
08 Oct 2025
Previous
123...789...888990
Next
Page 8 of 90
Pageof 90