Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 4,480 papers shown
FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models
Chuan Li
Qianyi Zhao
Fengran Mo
Cen Chen
LRM
132
0
0
07 Aug 2025
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Chengsong Huang
Wenhao Yu
Xiaoyang Wang
H. Zhang
Zongxia Li
Ruosen Li
J. Huang
Haitao Mi
Dong Yu
ReLM
SyDa
LRM
218
42
0
07 Aug 2025
IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards
Xu Guo
Tianyi Liang
Tong Jian
Xiaogui Yang
Ling-I Wu
Chenhui Li
Z. Lu
Qipeng Guo
Kai Chen
275
2
0
06 Aug 2025
Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning
Magauiya Zhussip
Dmitriy Shopkhoev
Ammar Ali
Stamatios Lefkimmiatis
104
2
0
06 Aug 2025
Large Language Model's Multi-Capability Alignment in Biomedical Domain
Weilei Wang
Linqing Chen
Hanmeng Zhong
Wentao Wu
LM&MA
ELM
127
0
0
06 Aug 2025
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
Yunan Zhang
Shuoran Jiang
Mengchen Zhao
Yuefeng Li
Yang Fan
Xiangping Wu
Qingcai Chen
KELM
CLL
135
1
0
06 Aug 2025
ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments
Yuquan Wang
Mi Zhang
Yining Wang
Geng Hong
Xiaoyu You
Min Yang
LRM
121
1
0
06 Aug 2025
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization
Negar Foroutan
Clara Meister
Debjit Paul
Joel Niklaus
Sina Ahmadi
Antoine Bosselut
Rico Sennrich
208
3
0
06 Aug 2025
Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning
Chang Tian
Matthew B. Blaschko
Mingzhe Xing
Xiuxing Li
Yinliang Yue
Marie-Francine Moens
OffRL
LRM
112
5
0
06 Aug 2025
ConfAgents: A Conformal-Guided Multi-Agent Framework for Cost-Efficient Medical Diagnosis
Huiya Zhao
Yinghao Zhu
Zixiang Wang
Yasha Wang
Junyi Gao
Liantao Ma
105
0
0
06 Aug 2025
Unveiling Over-Memorization in Finetuning LLMs for Reasoning Tasks
Zhiwen Ruan
Yun-Nung Chen
Yutao Hou
Peng Li
Yang Liu
Guanhua Chen
196
1
0
06 Aug 2025
Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs
Aryan Gulati
Brando Miranda
Eric Chen
Emily Xia
Kai Fronsdal
Bruno Dumont
Elyas Obbad
Sanmi Koyejo
AIMat
ReLM
LRM
370
7
0
05 Aug 2025
A Comparative Study of Neurosymbolic AI Approaches to Interpretable Logical Reasoning
Michael K. Chen
NAI
ELM
LRM
129
1
0
05 Aug 2025
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
He Xiao
Qingyao Yang
Dirui Xie
Wendong Xu
Wenyong Zhou
Haobo Liu
Zhengwu Liu
Ngai Wong
Zhengwu Liu
Ngai Wong
MQ
106
0
0
05 Aug 2025
RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging
The-Hai Nguyen
Dang Huu-Tien
Takeshi Suzuki
Le-Minh Nguyen
MoMe
274
2
0
05 Aug 2025
MoKA: Mixture of Kronecker Adapters
Mohammadreza Sadeghi
Mahsa Ghazvini Nejad
MirHamed Jafarzadeh Asl
Yu Gu
Yuanhao Yu
M. Asgharian
Vahid Partovi Nia
MoE
96
0
0
05 Aug 2025
Who is a Better Player: LLM against LLM
Yingjie Zhou
Jiezhang Cao
Farong Wen
Kepeng Xu
Yanwei Jiang
...
Yu Zhou
Xiongkuo Min
Jie Guo
Zicheng Zhang
Guangtao Zhai
137
0
0
05 Aug 2025
Test Set Quality in Multilingual LLM Evaluation
Kranti Chalamalasetti
Gabriel Bernier-Colborne
Yvan Gauthier
Sowmya Vajjala
ELM
151
1
0
04 Aug 2025
FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing
Shida Wang
Chaohu Liu
Yubo Wang
Linli Xu
KELM
254
3
0
04 Aug 2025
PentestJudge: Judging Agent Behavior Against Operational Requirements
Shane Caldwell
Max Harley
Michael Kouremetis
Vincent Abruzzo
Will Pearce
LLMAG
ELM
135
0
0
04 Aug 2025
ProCut: LLM Prompt Compression via Attribution Estimation
Zhentao Xu
Fengyi Li
Albert Chen
Xiaofeng Wang
171
1
0
04 Aug 2025
When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models
Keyu Wang
Jin Li
Shu Yang
Zhuoran Zhang
Haiyan Zhao
436
6
0
04 Aug 2025
The Architecture of Trust: A Framework for AI-Augmented Real Estate Valuation in the Era of Structured Data
Petteri Teikari
Mike Jarrell
Maryam Azh
Harri Pesola
172
1
0
04 Aug 2025
GrandJury: A Collaborative Machine Learning Model Evaluation Protocol for Dynamic Quality Rubrics
Arthur Cho
ALM
AILaw
ELM
134
0
0
04 Aug 2025
Trainable Dynamic Mask Sparse Attention
Jingze Shi
Yifan Wu
Yiran Peng
Yiran Peng
Liangdong Wang
Guang Liu
Yuyu Luo
332
3
0
04 Aug 2025
Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction
Yuerong Song
Xiaoran Liu
Ruixiao Li
Zhigeng Liu
Zengfeng Huang
Qipeng Guo
Ziwei He
Xipeng Qiu
160
20
0
04 Aug 2025
Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention
Xinhan Di
JoyJiaoW
LRM
107
2
0
03 Aug 2025
ROVER: Recursive Reasoning Over Videos with Vision-Language Models for Embodied Tasks
Philip Schroeder
Ondrej Biza
Thomas Weng
Hongyin Luo
James Glass
LM&Ro
LRM
159
0
0
03 Aug 2025
Quantum-RAG and PunGPT2: Advancing Low-Resource Language Generation and Retrieval for the Punjabi Language
Jaskaranjeet Singh
Rakesh Thakur
174
0
0
03 Aug 2025
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Wenxuan Wang
Zizhan Ma
Meidan Ding
S. Zheng
Shengyuan Liu
...
Jiaming Ji
Wenting Chen
Xiang Li
LinLin Shen
Yixuan Yuan
LRM
194
4
0
01 Aug 2025
MELAC: Massive Evaluation of Large Language Models with Alignment of Culture in Persian Language
Farhan Farsi
Farnaz Aghababaloo
Shahriar Shariati Motlagh
Parsa Ghofrani
MohammadAli SadraeiJavaheri
...
Amirhossein Shabani
Farbod Bijary
Ghazal Zamaninejad
Amirmohammad Salehoof
Saeedeh Momtazi
ELM
222
2
0
01 Aug 2025
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Zizhuo Zhang
Jianing Zhu
Xinmu Ge
Zihua Zhao
Zhanke Zhou
Xuan Li
Xiao Feng
Jiangchao Yao
Bo Han
ALM
LRM
288
0
0
01 Aug 2025
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents
Jianqiang Xiao
Yuexuan Sun
Yixin Shao
Boxi Gan
Rongqiang Liu
Yanjing Wu
Weili Gua
Xiang Deng
272
0
0
01 Aug 2025
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
Jinsong Li
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Jiaqi Wang
Dahua Lin
DiffM
155
12
0
01 Aug 2025
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
Sajana Weerawardhena
Paul Kassianik
Blaine Nelson
Baturay Saglam
Anu Vellore
...
Dhruv Kedia
Kojin Oshiba
Zhouran Yang
Yaron Singer
Amin Karbasi
ALM
ELM
180
4
0
01 Aug 2025
Calibrated Language Models and How to Find Them with Label Smoothing
J. Huang
Peng Lu
Qiuhao Zeng
236
1
0
01 Aug 2025
Lucy: edgerunning agentic web search on mobile with machine generated task vectors
Alan Dao
Dinh Bach Vu
Alex Nguyen
Norapat Buppodom
LRM
118
1
0
01 Aug 2025
LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring
Chloe Li
Mary Phuong
Noah Y. Siegel
ELM
445
4
0
31 Jul 2025
Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems
Lijia Liu
Takumi Kondo
Kyohei Atarashi
Koh Takeuchi
Jiyi Li
Shigeru Saito
H. Kashima
132
0
0
31 Jul 2025
Learning Like Humans: Resource-Efficient Federated Fine-Tuning through Cognitive Developmental Stages
Yebo Wu
Jingguang Li
Zhijiang Guo
Li Li
184
4
0
31 Jul 2025
Cascaded Information Disclosure for Generalized Evaluation of Problem Solving Capabilities
Yunxiang Yan
Tomohiro Sawada
Kartik Goyal
ELM
189
0
0
31 Jul 2025
TextQuests: How Good are LLMs at Text-Based Video Games?
Long Phan
Mantas Mazeika
Andy Zou
Dan Hendrycks
202
3
0
31 Jul 2025
EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes
Adam Block
Cyril Zhang
157
1
0
31 Jul 2025
DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System
Hui Yi Leong
Yuqing Wu
168
0
0
31 Jul 2025
CUS-QA: Local-Knowledge-Oriented Open-Ended Question Answering Dataset
Jindrich Libovický
Jindřich Helcl
Andrei-Alexandru Manea
Gianluca Vico
176
2
0
30 Jul 2025
BALSAM: A Platform for Benchmarking Arabic Large Language Models
Rawan N. Al-Matham
Kareem Darwish
Raghad Al-Rasheed
Waad Alshammari
Muneera Alhoshan
...
Sultana Alghurabi
Atikah Alzeghayer
Afrah Altamimi
Abdullah Alfaifi
Abdulrahman AlOsaimy
ELM
218
2
0
30 Jul 2025
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
Kwesi Cobbina
Tianyi Zhou
123
2
0
30 Jul 2025
Uncovering the Fragility of Trustworthy LLMs through Chinese Textual Ambiguity
Xinwei Wu
Haojie Li
Hongyu Liu
Xinyu Ji
Ruohan Li
Yule Chen
Yigeng Zhang
154
1
0
30 Jul 2025
Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead
Tom Sühr
Florian E. Dorner
Olawale Salaudeen
Augustin Kelava
Samira Samadi
ALM
ELM
165
2
0
30 Jul 2025
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
Q. Guo
Wei Xie
Xiaofang Cai
Enze Wang
Shuoyoucheng Ma
Kai Chen
Xiaofeng Wang
Baosheng Wang
Xiaofeng Wang
Baosheng Wang
ELM
180
0
0
30 Jul 2025
Previous
1
2
3
...
17
18
19
...
88
89
90
Next