Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 4,428 papers shown
Title
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
David Heineman
Valentin Hofmann
Ian H. Magnusson
Yuling Gu
Noah A. Smith
Hannaneh Hajishirzi
Kyle Lo
Jesse Dodge
ALM
88
3
0
18 Aug 2025
Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models
Ziqian Bi
Keyu Chen
Chiung-Yi Tseng
Danyang Zhang
Tianyang Wang
...
Junming Huang
Jibin Guan
Junfeng Hao
Junhao Song
Junhao Song
ELM
170
2
0
17 Aug 2025
ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models
Yuanfeng Xu
Zehui Dai
Jian Liang
Jiapeng Guan
Guangrun Wang
Liang Lin
Xiaohui Lv
LLMAG
LRM
92
0
0
17 Aug 2025
The Self-Execution Benchmark: Measuring LLMs' Attempts to Overcome Their Lack of Self-Execution
Elon Ezra
Ariel Weizman
Amos Azaria
LRM
64
0
0
17 Aug 2025
ZigzagAttention: Efficient Long-Context Inference with Exclusive Retrieval and Streaming Heads
Zhuorui Liu
Chen Zhang
Dawei Song
36
1
0
17 Aug 2025
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Jinyi Han
Tingyun Li
Shisong Chen
Jie Shi
X. Wang
...
Jiaqing Liang
Xin Lin
Liqian Wen
Zulong Chen
Yanghua Xiao
72
2
0
16 Aug 2025
Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models
Yuan Li
Zhengzhong Liu
Eric P. Xing
68
1
0
16 Aug 2025
AgentCDM: Enhancing Multi-Agent Collaborative Decision-Making via ACH-Inspired Structured Reasoning
Xuyang Zhao
Shiwan Zhao
Hualong Yu
Liting Zhang
Qicheng Li
LRM
AI4CE
58
2
0
16 Aug 2025
QuarkMed Medical Foundation Model Technical Report
A. Li
Bin Yan
Bingfeng Cai
Chenxi Li
Cunzhong Zhao
...
Xin Shang
Yao Wu
Yu Cao
Zhenxin Ma
Zhuang Jia
MedIm
LM&MA
135
0
0
16 Aug 2025
Mitigating Jailbreaks with Intent-Aware LLMs
Wei Jie Yeo
Frank Xing
Erik Cambria
AAML
92
0
0
16 Aug 2025
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Kangyu Wang
Hongliang He
Lin Liu
Ruiqi Liang
Zhenzhong Lan
Jianguo Li
ALM
ELM
98
0
0
15 Aug 2025
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
Rui Bao
Nan Xue
Yaping Sun
Zhiyong Chen
54
1
0
15 Aug 2025
Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
Sylvio Rüdian
Yassin Elsir
Marvin Kretschmer
Sabine Cayrou
Niels Pinkwart
56
0
0
15 Aug 2025
Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models
Monika Jotautaitė
Lucius Caviola
David A. Brewster
Thilo Hagendorff
116
0
0
15 Aug 2025
Personalized Distractor Generation via MCTS-Guided Reasoning Reconstruction
Tao Wu
Jingyuan Chen
Wang Lin
Jian Zhan
Mengze Li
Kun Kuang
Fei Wu
AI4Ed
LRM
239
1
0
15 Aug 2025
Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions
Leigh Levinson
Christopher J. Agostino
48
0
0
15 Aug 2025
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Mikhail Seleznyov
Mikhail Chaichuk
Gleb Ershov
Alexander Panchenko
Elena Tutubalina
Oleg Somov
87
4
0
15 Aug 2025
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
Pratyush Maini
Pratyush Maini
Vineeth Dorna
Aldo Carranza
Fan Pan
...
Spandan Das
Zhengping Wang
Bogdan Gaza
Ari S. Morcos
Matthew L. Leavitt
SyDa
76
0
0
14 Aug 2025
Robot Policy Evaluation for Sim-to-Real Transfer: A Benchmarking Perspective
Xuning Yang
Clemens Eppner
Jonathan Tremblay
Dieter Fox
Stan Birchfield
Fabio Ramos
100
2
0
14 Aug 2025
Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs
Xiangqi Jin
Y. Wang
Yifeng Gao
Zichen Wen
Biqing Qi
Dongrui Liu
Linfeng Zhang
LRM
120
6
0
14 Aug 2025
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
Xinyan Jiang
L. Zhang
Jiayi Zhang
Qingsong Yang
Guimin Hu
Di Wang
Lijie Hu
LLMSV
275
1
0
14 Aug 2025
Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development
Sattvik Sahai
Prasoon Goyal
Michael Johnston
Anna Gottardi
Yao Lu
...
Lavina Vaz
Leslie Ball
Maureen Murray
Rahul Gupta
Shankar Ananthakrishna
85
1
0
13 Aug 2025
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Bin Hong
Jiayu Liu
Zhenya Huang
Kai Zhang
Mengdi Zhang
LRM
130
0
0
13 Aug 2025
Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks
Nouar Aldahoul
Yasir Zaki
LM&MA
AI4MH
ELM
114
0
0
13 Aug 2025
Methodological Framework for Quantifying Semantic Test Coverage in RAG Systems
Noah Broestl
Adel Nasser Abdalla
Rajprakash Bale
Hersh Gupta
Max Struever
19
0
0
13 Aug 2025
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
Birong Pan
Mayi Xu
Qiankun Pi
Jianhao Chen
Yuanyuan Zhu
Ming Zhong
T. Qian
100
0
0
13 Aug 2025
mSCoRe: a
M
M
M
ultilingual and Scalable Benchmark for
S
S
S
kill-based
C
o
Co
C
o
mmonsense
R
e
Re
R
e
asoning
Nghia Trung Ngo
Franck Dernoncourt
T. Nguyen
LRM
130
0
0
13 Aug 2025
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Zhifan Luo
Shuo Shao
Su Zhang
Lijing Zhou
Yuke Hu
Chenxu Zhao
Zhihao Liu
Zhan Qin
128
3
0
13 Aug 2025
EffiEval: Efficient and Generalizable Model Evaluation via Capability Coverage Maximization
Yaoning Wang
Jiahao Ying
Yixin Cao
Yubo Ma
Yugang Jiang
ELM
32
2
0
13 Aug 2025
BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them
Sekh Mainul Islam
Nadav Borenstein
Siddhesh Pawar
Haeun Yu
Arnav Arora
Isabelle Augenstein
146
1
0
12 Aug 2025
TiMoE: Time-Aware Mixture of Language Experts
Robin Faro
Dongyang Fan
Tamar Alphaidze
Martin Jaggi
MoE
105
1
0
12 Aug 2025
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
Charles OÑeill
Mudith Jayasekara
Max Kirkby
81
0
0
12 Aug 2025
Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks
Adit Krishnan
Chu Wang
Chris Kong
65
0
0
12 Aug 2025
IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization
Yuzhuo Bai
Shitong Duan
Muhua Huang
Jing Yao
Zhenghao Liu
Peng Zhang
Tun Lu
Xiaoyuan Yi
Maosong Sun
Xing Xie
120
1
0
12 Aug 2025
InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
Peiji Li
Jiasheng Ye
Yongkang Chen
Yichuan Ma
Zijie Yu
...
Linyang Li
Qipeng Guo
Dahua Lin
Bowen Zhou
Kai Chen
LLMAG
ALM
LRM
92
10
0
12 Aug 2025
SinLlama -- A Large Language Model for Sinhala
Moratuwa Engineering Research Conference (MERCon), 2025
H.W.K.Aravinda
Rashad Sirajudeen
Samith Karunathilake
Nisansa de Silva
Surangika Ranathunga
Rishemjit Kaur
LRM
180
1
0
12 Aug 2025
A Survey on Training-free Alignment of Large Language Models
Birong Pan
Yongqi Li
Jiasheng Si
Sibo Wei
Mayi Xu
Shen Zhou
Yuanyuan Zhu
Ming Zhong
T. Qian
3DV
LM&MA
296
0
0
12 Aug 2025
Scaling Up Active Testing to Large Language Models
Gabrielle Berrada
Jannik Kossen
Muhammed Razzak
Freddie Bickford-Smith
Y. Gal
Tom Rainforth
ALM
128
1
0
12 Aug 2025
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
Junjie Ye
C. Jiang
Zhengyin Du
Yufei Xu
Xuesong Yao
...
Xiaoran Fan
Qi Zhang
Tao Gui
Xuanjing Huang
Jiecao Chen
KELM
OffRL
120
4
0
12 Aug 2025
AgriGPT: a Large Language Model Ecosystem for Agriculture
Bo Yang
Yu Zhang
Lanfei Feng
Yunkui Chen
J. Zhang
...
Yuxuan Chen
Guijun Yang
Yong He
Runhe Huang
Shijian Li
LLMAG
KELM
166
4
0
12 Aug 2025
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
163
3
0
11 Aug 2025
Evaluating Large Language Models as Expert Annotators
Yu-Min Tseng
Wei-Lin Chen
Chung-Chi Chen
Hsin-Hsi Chen
92
1
0
11 Aug 2025
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Mansi Phute
Ravikumar Balakrishnan
LLMSV
60
0
0
11 Aug 2025
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Zihe Liu
Jiashun Liu
Yancheng He
Weixun Wang
Jiaheng Liu
...
Siran Yang
Jiamang Wang
Yuchi Xu
Bo Zheng
B. Zheng
OffRL
80
20
0
11 Aug 2025
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Haoyuan Wu
Haoxing Chen
Xiaodong Chen
Zhanchao Zhou
Tieyuan Chen
...
Junbo Zhao
Lin Liu
Zhenzhong Lan
Bei Yu
Jianguo Li
MoE
108
4
0
11 Aug 2025
OverFill: Two-Stage Models for Efficient Language Model Decoding
Woojeong Kim
Junxiong Wang
Jing Nathan Yan
Mohamed S. Abdelfattah
Alexander M Rush
72
0
0
11 Aug 2025
Capabilities of GPT-5 on Multimodal Medical Reasoning
Shansong Wang
Mingzhe Hu
Qiang Li
Mojtaba Safari
Xiaofeng Yang
ELM
LM&MA
AI4MH
LRM
115
27
0
11 Aug 2025
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
Yerin Hwang
Dongryeol Lee
Taegwan Kang
Yongil Kim
Kyomin Jung
AAML
ELM
96
1
0
11 Aug 2025
HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways
Cristian Cosentino
Annamaria Defilippo
Marco Dossena
Christopher Irwin
Sara Joubbi
Pietro Lio'
LM&MA
AI4MH
116
0
0
10 Aug 2025
Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond
Rubing Chen
Jiaxin Wu
Jian Wang
Xulu Zhang
Wenqi Fan
Chenghua Lin
Xiao-Yong Wei
Qing Li
ALM
188
0
0
10 Aug 2025
Previous
1
2
3
...
15
16
17
...
87
88
89
Next