Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 4,477 papers shown
Title
VARCO-VISION-2.0 Technical Report
Young-rok Cha
Jeongho Ju
SunYoung Park
Jong-Hyeon Lee
Younghyun Yu
Youngjune Kim
VLM
201
1
0
12 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
288
3
0
12 Sep 2025
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation
Iman Barati
Mostafa Amiri
Heshaam Faili
ALM
128
0
0
12 Sep 2025
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
Bingxin Xu
Zhen Dong
Oussama Elachqar
Yuzhang Shang
MQ
164
1
0
11 Sep 2025
Measuring Epistemic Humility in Multimodal Large Language Models
Bingkui Tong
Jiaer Xia
Sifeng Shang
Kaiyang Zhou
HILM
124
2
0
11 Sep 2025
TORSO: Template-Oriented Reasoning Towards General Tasks
Minhyuk Kim
Seungyoon Lee
Heuiseok Lim
LRM
162
0
0
11 Sep 2025
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Akshit Sinha
Arvindh Arun
Shashwat Goel
Steffen Staab
Jonas Geiping
ALM
LRM
269
7
0
11 Sep 2025
Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning
Wei Huang
Anda Cheng
Yinggui Wang
KELM
MoMe
CLL
103
0
0
10 Sep 2025
Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism
Jiaming Yan
Jianchun Liu
Hongli Xu
Liusheng Huang
MoE
122
5
0
10 Sep 2025
Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison
Marianna Nezhurina
Jörg Franke
Taishi Nakamura
Timur Carstensen
Niccolò Ajroldi
Ville Komulainen
David Salinas
J. Jitsev
141
2
0
10 Sep 2025
Causal Attention with Lookahead Keys
Zhuoqing Song
Peng Sun
Huizhuo Yuan
Quanquan Gu
CML
168
0
0
09 Sep 2025
Performance Assessment Strategies for Generative AI Applications in Healthcare
Victor Garcia
Mariia Sidulova
Aldo Badano
130
0
0
09 Sep 2025
Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents
Sankalp Tattwadarshi Swain
Anshika Krishnatray
Dhruv Kumar
Jagat Sesh Challa
LLMAG
48
0
0
09 Sep 2025
MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations
Ruggero Marino Lazzaroni
Alessandro Angioi
Michelangelo Puliga
Davide Sanna
Roberto Marras
LM&MA
ELM
148
1
0
08 Sep 2025
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Eugene Kwek
Wenpeng Yin
VLM
236
0
0
08 Sep 2025
Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing
Yuanteng Chen
Peisong Wang
Yuantian Shao
Nanxin Zeng
Chang Xu
Jian Cheng
MoE
154
0
0
08 Sep 2025
Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian
Michael Hoffmann
Jophin John
Stefan Schweter
Gokul Ramakrishnan
Hoi-Fong Mak
Alice Zhang
Dmitry Gaynullin
Nicolay J. Hammer
CLL
156
1
0
06 Sep 2025
Hyperbolic Large Language Models
Sarang Patil
Zeyong Zhang
Yiran Huang
Tengfei Ma
Mengjia Xu
AI4CE
206
1
0
06 Sep 2025
PLaMo 2 Technical Report
Preferred Networks
Kaizaburo Chubachi
Yasuhiro Fujita
Shinichi Hemmi
Yuta Hirokawa
...
Daisuke Tanaka
Avinash Ummadisingu
Hanqin Wang
Sixue Wang
Tianqi Xu
MoE
VLM
105
0
0
05 Sep 2025
Personality as a Probe for LLM Evaluation: Method Trade-offs and Downstream Effects
Gunmay Handa
Zekun Wu
Adriano Soares Koshiyama
Philip C. Treleaven
120
1
0
05 Sep 2025
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
Andrea Wynn
Harsh Satija
Gillian Hadfield
171
9
0
05 Sep 2025
Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too
Logan Lawrence
Ashton Williamson
Alexander Shelton
ELM
97
0
0
05 Sep 2025
What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
Yuan Sui
Yanming Zhang
Yi Liao
Yu Gu
Guohua Tang
Zhongqian Sun
Wei Yang
Xu Cheng
LLMAG
281
0
0
05 Sep 2025
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
Cheng Li
Jiexiong Liu
Yixuan Chen
Jie ji
MoE
74
0
0
05 Sep 2025
Hunyuan-MT Technical Report
Mao Zheng
Zheng Li
Bingxin Qu
Mingyang Song
Yang Du
Mingrui Sun
Di Wang
LRM
119
2
0
05 Sep 2025
Knowledge Collapse in LLMs: When Fluency Survives but Facts Fail under Recursive Synthetic Training
Figarri Keisha
Zekun Wu
Ze Wang
Adriano Soares Koshiyama
Philip C. Treleaven
KELM
135
0
0
05 Sep 2025
Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning
Wei Yang
Jesse Thomason
162
5
0
04 Sep 2025
SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment
Yuqing Huang
Rongyang Zhang
Qimeng Wang
Chengqiang Lu
Yan Gao
...
Xuyang Zhi
Guiquan Liu
Xin Li
Hao Wang
Tong Xu
CLL
163
2
0
04 Sep 2025
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
Qinyan Zhang
X. Lei
Ruijie Miao
Y. Fu
Haojie Fan
...
Jiaheng Liu
Tong Yang
Z. Wang
G. Zhang
Wenhao Huang
104
1
0
04 Sep 2025
RL's Razor: Why Online Reinforcement Learning Forgets Less
Idan Shenfeld
Jyothish Pari
Pulkit Agrawal
CLL
171
41
0
04 Sep 2025
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Yang Wang
Chenghao Xiao
Chia-Yi Hsiao
Zi Yan Chang
Chi-Li Chen
Tyler Loakman
Chenghua Lin
235
1
0
04 Sep 2025
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs
Riccardo Lunardi
V. D. Mea
Stefano Mizzaro
Kevin Roitero
164
3
0
04 Sep 2025
Set Block Decoding is a Language Model Inference Accelerator
Itai Gat
Heli Ben-Hamu
Marton Havasi
Daniel Haziza
Jeremy Reizenstein
Gabriel Synnaeve
David Lopez-Paz
Brian Karrer
Y. Lipman
142
6
0
04 Sep 2025
SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala
Ashmari Pramodya
Nirasha Nelki
Heshan Shalinda
Chamila Liyanage
Yusuke Sakai
Randil Pushpananda
Ruvan Weerasinghe
Hidetaka Kamigaito
Taro Watanabe
LRM
131
0
0
03 Sep 2025
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
Xiaobo Wang
Zixia Jia
Jiaqi Li
Qi Liu
Zilong Zheng
96
0
0
03 Sep 2025
Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving
Fangzhou Wu
Sandeep Silwal
221
0
0
02 Sep 2025
Implicit Reasoning in Large Language Models: A Comprehensive Survey
Jindong Li
Yali Fu
Li Fan
Jiahong Liu
Yao Shu
Chengwei Qin
Menglin Yang
Irwin King
Rex Ying
OffRL
LRM
AI4CE
196
10
0
02 Sep 2025
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
Naman D. Singh
Maximilian Müller
Francesco Croce
Matthias Hein
MU
KELM
CLL
187
4
0
02 Sep 2025
Behavioral Fingerprinting of Large Language Models
Zehua Pei
Hui-Ling Zhen
Ying Zhang
Zhiyuan Yang
Xing Li
Xianzhi Yu
Mingxuan Yuan
Bei Yu
72
1
0
02 Sep 2025
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
Krishna Teja Chitty-Venkata
Sandeep Madireddy
M. Emani
V. Vishwanath
MoE
151
1
0
02 Sep 2025
Perturbing the Derivative: Wild Refitting for Model-Free Evaluation of Machine Learning Models under Bregman Losses
Haichen Hu
David Simchi-Levi
409
1
0
02 Sep 2025
JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer
Zhichao Shi
Xuhui Jiang
Chengjin Xu
Cangli Yao
Zhenxin Huang
Shengjie Ma
Yinghan Shen
Jian Guo
Yuanzhuo Wang
LLMAG
ELM
227
0
0
02 Sep 2025
Dream-Coder 7B: An Open Diffusion Language Model for Code
Zhihui Xie
Jiacheng Ye
Lin Zheng
Lei Li
Jingwei Dong
...
Xueliang Zhao
Shansan Gong
Xin Jiang
Zhenguo Li
Lingpeng Kong
DiffM
115
17
0
01 Sep 2025
Culture is Everywhere: A Call for Intentionally Cultural Evaluation
Juhyun Oh
Inha Cha
Michael Saxon
Hyunseung Lim
Shaily Bhatt
Alice Oh
193
0
0
01 Sep 2025
LongCat-Flash Technical Report
M-A-P Team
Bayan
Bei Li
Bingye Lei
Bo Wang
...
Rongxiang Weng
Ruichen Shao
Rumei Li
Shizhe Wu
Shuai Liang
MLLM
MoE
VLM
384
14
0
01 Sep 2025
Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs
Andong Hua
Kenan Tang
Chenhe Gu
Jindong Gu
Eric Wong
Yao Qin
LRM
107
1
0
01 Sep 2025
KoBLEX: Open Legal Question Answering with Multi-hop Reasoning
Jihyung Lee
Daehui Kim
Seonjeong Hwang
Hyounghun Kim
G. G. Lee
RALM
ELM
123
1
0
01 Sep 2025
REFRAG: Rethinking RAG based Decoding
Xiaoqiang Lin
Aritra Ghosh
Bryan Kian Hsiang Low
Anshumali Shrivastava
Vijai Mohan
LLMAG
186
1
0
01 Sep 2025
Efficient Large Language Models with Zero-Shot Adjustable Acceleration
Sajjad Kachuee
M. Sharifkhani
158
0
0
01 Sep 2025
An LLM-enabled semantic-centric framework to consume privacy policies
Rui Zhao
Vladyslav Melnychuk
Jun Zhao
Jesse Wright
Nigel Shadbolt
144
0
0
01 Sep 2025
Previous
1
2
3
...
13
14
15
...
88
89
90
Next