Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 4,483 papers shown
Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs
Yixiao Zhou
Ziyu Zhao
Dongzhou Cheng
Zhiliang Wu
Jie Gui
Yi-feng Yang
Fei Wu
Yu Cheng
Hehe Fan
MoMe
MoE
164
5
0
12 Sep 2025
VARCO-VISION-2.0 Technical Report
Young-rok Cha
Jeongho Ju
SunYoung Park
Jong-Hyeon Lee
Younghyun Yu
Youngjune Kim
VLM
215
2
0
12 Sep 2025
Automated MCQA Benchmarking at Scale: Evaluating Reasoning Traces as Retrieval Sources for Domain Adaptation of Small Language Models
Ozan Gokdemir
N. Getty
Robert Underwood
Sandeep Madireddy
Franck Cappello
Arvind Ramanathan
Ian Foster
R. Stevens
ELM
LRM
112
1
0
12 Sep 2025
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation
Iman Barati
Mostafa Amiri
Heshaam Faili
ALM
137
0
0
12 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
320
3
0
12 Sep 2025
Measuring Epistemic Humility in Multimodal Large Language Models
Bingkui Tong
Jiaer Xia
Sifeng Shang
Kaiyang Zhou
HILM
143
2
0
11 Sep 2025
TORSO: Template-Oriented Reasoning Towards General Tasks
Minhyuk Kim
Seungyoon Lee
Heuiseok Lim
LRM
189
0
0
11 Sep 2025
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Akshit Sinha
Arvindh Arun
Shashwat Goel
Steffen Staab
Jonas Geiping
ALM
LRM
300
8
0
11 Sep 2025
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
Bingxin Xu
Zhen Dong
Oussama Elachqar
Yuzhang Shang
MQ
192
1
0
11 Sep 2025
Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning
Wei Huang
Anda Cheng
Yinggui Wang
KELM
MoMe
CLL
142
1
0
10 Sep 2025
Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism
Jiaming Yan
Jianchun Liu
Hongli Xu
Liusheng Huang
MoE
127
5
0
10 Sep 2025
Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison
Marianna Nezhurina
Jörg Franke
Taishi Nakamura
Timur Carstensen
Niccolò Ajroldi
Ville Komulainen
David Salinas
J. Jitsev
178
2
0
10 Sep 2025
Causal Attention with Lookahead Keys
Zhuoqing Song
Peng Sun
Huizhuo Yuan
Quanquan Gu
CML
189
0
0
09 Sep 2025
Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents
Sankalp Tattwadarshi Swain
Anshika Krishnatray
Dhruv Kumar
Jagat Sesh Challa
LLMAG
63
0
0
09 Sep 2025
Performance Assessment Strategies for Generative AI Applications in Healthcare
Victor Garcia
Mariia Sidulova
Aldo Badano
141
0
0
09 Sep 2025
MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations
Ruggero Marino Lazzaroni
Alessandro Angioi
Michelangelo Puliga
Davide Sanna
Roberto Marras
LM&MA
ELM
148
1
0
08 Sep 2025
Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing
Yuanteng Chen
Peisong Wang
Yuantian Shao
Nanxin Zeng
Chang Xu
Jian Cheng
MoE
178
0
0
08 Sep 2025
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Eugene Kwek
Wenpeng Yin
VLM
265
0
0
08 Sep 2025
Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian
Michael Hoffmann
Jophin John
Stefan Schweter
Gokul Ramakrishnan
Hoi-Fong Mak
Alice Zhang
Dmitry Gaynullin
Nicolay J. Hammer
CLL
162
1
0
06 Sep 2025
Hyperbolic Large Language Models
Sarang Patil
Zeyong Zhang
Yiran Huang
Tengfei Ma
Mengjia Xu
AI4CE
215
0
0
06 Sep 2025
PLaMo 2 Technical Report
Preferred Networks
Kaizaburo Chubachi
Yasuhiro Fujita
Shinichi Hemmi
Yuta Hirokawa
...
Daisuke Tanaka
Avinash Ummadisingu
Hanqin Wang
Sixue Wang
Tianqi Xu
MoE
VLM
123
0
0
05 Sep 2025
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
Cheng Li
Jiexiong Liu
Yixuan Chen
Jie ji
MoE
105
0
0
05 Sep 2025
Knowledge Collapse in LLMs: When Fluency Survives but Facts Fail under Recursive Synthetic Training
Figarri Keisha
Zekun Wu
Ze Wang
Adriano Soares Koshiyama
Philip C. Treleaven
KELM
178
0
0
05 Sep 2025
Personality as a Probe for LLM Evaluation: Method Trade-offs and Downstream Effects
Gunmay Handa
Zekun Wu
Adriano Soares Koshiyama
Philip C. Treleaven
126
1
0
05 Sep 2025
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
Andrea Wynn
Harsh Satija
Gillian Hadfield
175
10
0
05 Sep 2025
Hunyuan-MT Technical Report
Mao Zheng
Zheng Li
Bingxin Qu
Mingyang Song
Yang Du
Mingrui Sun
Di Wang
LRM
137
2
0
05 Sep 2025
Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too
Logan Lawrence
Ashton Williamson
Alexander Shelton
ELM
113
0
0
05 Sep 2025
What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
Yuan Sui
Yanming Zhang
Yi Liao
Yu Gu
Guohua Tang
Zhongqian Sun
Wei Yang
Xu Cheng
LLMAG
330
0
0
05 Sep 2025
Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning
Wei Yang
Jesse Thomason
192
5
0
04 Sep 2025
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs
Riccardo Lunardi
V. D. Mea
Stefano Mizzaro
Kevin Roitero
165
5
0
04 Sep 2025
SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment
Yuqing Huang
Rongyang Zhang
Qimeng Wang
Chengqiang Lu
Yan Gao
...
Xuyang Zhi
Guiquan Liu
Xin Li
Hao Wang
Tong Xu
CLL
178
2
0
04 Sep 2025
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Yang Wang
Chenghao Xiao
Chia-Yi Hsiao
Zi Yan Chang
Chi-Li Chen
Tyler Loakman
Chenghua Lin
256
1
0
04 Sep 2025
RL's Razor: Why Online Reinforcement Learning Forgets Less
Idan Shenfeld
Jyothish Pari
Pulkit Agrawal
CLL
192
43
0
04 Sep 2025
Set Block Decoding is a Language Model Inference Accelerator
Itai Gat
Heli Ben-Hamu
Marton Havasi
Daniel Haziza
Jeremy Reizenstein
Gabriel Synnaeve
David Lopez-Paz
Brian Karrer
Y. Lipman
150
6
0
04 Sep 2025
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
Qinyan Zhang
X. Lei
Ruijie Miao
Y. Fu
Haojie Fan
...
Jiaheng Liu
Tong Yang
Z. Wang
G. Zhang
Wenhao Huang
135
1
0
04 Sep 2025
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
Xiaobo Wang
Zixia Jia
Jiaqi Li
Qi Liu
Zilong Zheng
104
0
0
03 Sep 2025
SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala
Ashmari Pramodya
Nirasha Nelki
Heshan Shalinda
Chamila Liyanage
Yusuke Sakai
Randil Pushpananda
Ruvan Weerasinghe
Hidetaka Kamigaito
Taro Watanabe
LRM
156
0
0
03 Sep 2025
Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving
Fangzhou Wu
Sandeep Silwal
235
0
0
02 Sep 2025
Behavioral Fingerprinting of Large Language Models
Zehua Pei
Hui-Ling Zhen
Ying Zhang
Zhiyuan Yang
Xing Li
Xianzhi Yu
Mingxuan Yuan
Bei Yu
78
2
0
02 Sep 2025
Perturbing the Derivative: Wild Refitting for Model-Free Evaluation of Machine Learning Models under Bregman Losses
Haichen Hu
David Simchi-Levi
454
0
0
02 Sep 2025
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
Naman D. Singh
Maximilian Müller
Francesco Croce
Matthias Hein
MU
KELM
CLL
195
4
0
02 Sep 2025
JudgeAgent: Beyond Static Benchmarks for Knowledge-Driven and Dynamic LLM Evaluation
Zhichao Shi
Xuhui Jiang
Chengjin Xu
Cangli Yao
Zhenxin Huang
Shengjie Ma
Yinghan Shen
Jian Guo
Yuanzhuo Wang
LLMAG
ELM
296
0
0
02 Sep 2025
Implicit Reasoning in Large Language Models: A Comprehensive Survey
Jindong Li
Yali Fu
Li Fan
Jiahong Liu
Yao Shu
Chengwei Qin
Menglin Yang
Irwin King
Rex Ying
OffRL
LRM
AI4CE
229
14
0
02 Sep 2025
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
Krishna Teja Chitty-Venkata
Sandeep Madireddy
M. Emani
V. Vishwanath
MoE
160
1
0
02 Sep 2025
Dream-Coder 7B: An Open Diffusion Language Model for Code
Zhihui Xie
Jiacheng Ye
Lin Zheng
Lei Li
Jingwei Dong
...
Xueliang Zhao
Shansan Gong
Xin Jiang
Zhenguo Li
Lingpeng Kong
DiffM
139
22
0
01 Sep 2025
LongCat-Flash Technical Report
M-A-P Team
Bayan
Bei Li
Bingye Lei
Bo Wang
...
Rongxiang Weng
Ruichen Shao
Rumei Li
Shizhe Wu
Shuai Liang
MLLM
MoE
VLM
403
15
0
01 Sep 2025
An LLM-enabled semantic-centric framework to consume privacy policies
Rui Zhao
Vladyslav Melnychuk
Jun Zhao
Jesse Wright
Nigel Shadbolt
157
0
0
01 Sep 2025
Culture is Everywhere: A Call for Intentionally Cultural Evaluation
Juhyun Oh
Inha Cha
Michael Saxon
Hyunseung Lim
Shaily Bhatt
Alice Oh
207
1
0
01 Sep 2025
REFRAG: Rethinking RAG based Decoding
Xiaoqiang Lin
Aritra Ghosh
Bryan Kian Hsiang Low
Anshumali Shrivastava
Vijai Mohan
LLMAG
229
1
0
01 Sep 2025
Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs
Andong Hua
Kenan Tang
Chenhe Gu
Jindong Gu
Eric Wong
Yao Qin
LRM
118
2
0
01 Sep 2025
Previous
1
2
3
...
13
14
15
...
88
89
90
Next