Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 4,456 papers shown
Title
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
160
0
0
24 Dec 2025
SoK: Are Watermarks in LLMs Ready for Deployment?
Kieu Dang
Phung Lai
Nhathai Phan
Yelong Shen
Ruoming Jin
Abdallah Khreishah
My T. Thai
143
1
0
24 Dec 2025
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in Large Language Model Routing
Yichi Zhang
Fangzheng Xie
Shu Yang
Chong Wu
84
0
0
24 Dec 2025
LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems
Yuanhe Zhang
Weiliu Wang
Zhenhong Zhou
Kun Wang
Jie Zhang
Li Sun
Yang Liu
Sen Su
20
0
0
02 Dec 2025
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
Róbert Belanec
Ivan Srba
Maria Bielikova
ALM
308
0
0
02 Dec 2025
Lumos: Let there be Language Model System Certification
Isha Chaudhary
Vedaant V. Jain
Avaljot Singh
Kavya Sachdeva
Sayan Ranu
Gagandeep Singh
20
0
0
02 Dec 2025
When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers
Jack Lu
Ryan Teehan
Jinran Jin
Mengye Ren
LRM
88
0
0
02 Dec 2025
Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules
Amr Mohamed
Yang Zhang
Michalis Vazirgiannis
Guokan Shang
AI4CE
60
0
0
02 Dec 2025
Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging
Kuangpu Guo
Yuhe Ding
Jian Liang
Zilei Wang
Ran He
MoMe
61
0
0
01 Dec 2025
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
52
0
0
01 Dec 2025
InstructLR: A Scalable Approach to Create Instruction Dataset for Under-Resourced Languages
Mamadou K. Keita
Sébastien Diarra
Christopher Homan
Seydou Diallo
8
0
0
01 Dec 2025
KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference
Sai Gokhale
Devleena Das
Rajeev Patwari
Ashish Sirasao
Elliott Delaye
MQ
268
0
0
01 Dec 2025
EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education
Guoqing Ma
Jia Zhu
Hanghui Guo
Weijie Shi
Yue Cui
Jiawei Shen
Zilong Li
Yidan Liang
AI4Ed
ELM
272
0
0
29 Nov 2025
Breaking It Down: Domain-Aware Semantic Segmentation for Retrieval Augmented Generation
Aparajitha Allamraju
Maitreya Prafulla Chitale
Hiranmai Sri Adibhatla
Rahul Mishra
Manish Shrivastava
16
0
0
29 Nov 2025
OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning
Timothy Ossowski
Sheng Zhang
Qianchu Liu
Guanghui Qin
Reuben Tan
Tristan Naumann
Junjie Hu
Hoifung Poon
LRM
164
0
0
28 Nov 2025
AgentShield: Make MAS more secure and efficient
Kaixiang Wang
Zhaojiacheng Zhou
Bunyod Suvonov
Jiong Lou
Jie Li
AAML
92
0
0
28 Nov 2025
MathSight: A Benchmark Exploring Have Vision-Language Models Really Seen in University-Level Mathematical Reasoning?
Yuandong Wang
Yao Cui
Yuxin Zhao
Zhen Yang
Yangfu Zhu
Zhenzhou Shao
CoGe
VLM
LRM
132
0
0
28 Nov 2025
A Rosetta Stone for AI Benchmarks
A. Ho
Jean-Stanislas Denain
David Atanasov
Samuel Albanie
Rohin Shah
ELM
168
0
0
28 Nov 2025
Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day
Milad Abdollahzadeh
Abdul Raheem
Zilong Zhao
Uzair Javaid
Kevin Yee
Nalam Venkata Abhishek
Tram Truong-Huu
Biplab Sikdar
LMTD
ALM
159
0
0
28 Nov 2025
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
X. S. Hu
Zhanchao Zhou
Ruiqi Liang
Zehuan Li
Wei Wu
Jianguo Li
92
0
0
28 Nov 2025
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
Guanxi Lu
Hao Mark Chen
Zhiqiang Que
Wayne Luk
Hongxiang Fan
MQ
80
0
0
27 Nov 2025
Unexplored flaws in multiple-choice VQA evaluations
Fabio Rosenthal
Sebastian Schmidt
Thorsten Graf
Thorsten Bagodonat
Stephan Günnemann
Leo Schwinn
16
0
0
27 Nov 2025
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Zhenchao Tang
Fang Wang
Haohuai He
Jiale Zhou
Tianxu Lv
...
Minghao Yang
Y. Wang
Jiayang Wu
Yidong Song
J. Yao
CLL
394
0
0
26 Nov 2025
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Dongyang Fan
Diba Hashemi
Sai Praneeth Karimireddy
Martin Jaggi
97
0
0
26 Nov 2025
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Yeganeh Kordi
Nihal V. Nayak
Max Zuo
Ilana Nguyen
Stephen H. Bach
112
0
0
26 Nov 2025
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
Frederico Wieser
Martin A Benfeghoul
Haitham Bou-Ammar
Jun Wang
Zafeirios Fountas
102
0
0
26 Nov 2025
Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices
Paulo Cavalin
Cassia Sanctos
Marcelo Grave
Claudio S. Pinhanez
Yago Primerano
32
0
0
26 Nov 2025
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
Peiran Xu
Sudong Wang
Yao Zhu
Jianing Li
Yunjian Zhang
LRM
286
0
0
26 Nov 2025
On the Limits of Innate Planning in Large Language Models
Charles Schepanowski
Charles Ling
LLMAG
LRM
ELM
405
0
0
26 Nov 2025
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
Róbert Belanec
Branislav Pecher
Ivan Srba
Maria Bielikova
103
1
0
26 Nov 2025
Representation Interventions Enable Lifelong Unstructured Knowledge Control
Xuyuan Liu
Zhengzhang Chen
Xinshuai Dong
Yanchi Liu
Xujiang Zhao
Shengyu Chen
Haoyu Wang
Yujun Yan
Haifeng Chen
KELM
72
0
0
25 Nov 2025
Vision-Language Memory for Spatial Reasoning
Zuntao Liu
Yi Du
Taimeng Fu
Shaoshu Su
Cherie Ho
Chen Wang
VLM
LRM
189
0
0
25 Nov 2025
Mirror, Mirror on the Wall -- Which is the Best Model of Them All?
Dina Sayed
Heiko Schuldt
20
0
0
25 Nov 2025
Geometry of Decision Making in Language Models
Abhinav Joshi
Divyanshu Bhatt
Ashutosh Modi
AI4CE
LRM
262
0
0
25 Nov 2025
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Wentao Hu
Mingkuan Zhao
Shuangyong Song
Xiaoyan Zhu
Xin Lai
Jiayin Wang
95
1
0
25 Nov 2025
Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries
Sree Bhattacharyya
Yaman Kumar Singla
Sudhir Yarram
Somesh Singh
Harini S I
James Z. Wang
88
0
0
25 Nov 2025
Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy
Daniel I Jackson
Emma L Jensen
Syed-Amad Hussain
Emre Sezgin
AI4MH
ELM
267
0
0
25 Nov 2025
Structured Prompting Enables More Robust Evaluation of Language Models
Asad Aali
Muhammad Ahmed Mohsin
Vasiliki Bikia
Arnav Singhvi
Richard Gaus
...
Sanmi Koyejo
Emily Alsentzer
Christopher Potts
N. Shah
Akshay Chaudhari
ELM
LRM
138
0
0
25 Nov 2025
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
Yuanhao Li
Mingshan Liu
Hongbo Wang
Yiding Zhang
Yifei Ma
Wei Tan
AI4TS
KELM
LRM
AI4CE
370
0
0
25 Nov 2025
BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
Abdullah Al Sefat
132
0
0
25 Nov 2025
Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores
Congren Dai
Yue Yang
Krinos Li
Huichi Zhou
Shijie Liang
...
Peiyuan Jing
Kinhei Lee
Zhenxuan Zhang
Xiaobing Li
Maosong Sun
64
0
0
24 Nov 2025
EAGER: Edge-Aligned LLM Defense for Robust, Efficient, and Accurate Cybersecurity Question Answering
Onat Gungor
Roshan Sood
Jiasheng Zhou
T. Rosing
AAML
41
0
0
24 Nov 2025
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Junbo Zhang
Ran Chen
Qianli Zhou
Xinyang Deng
Wen Jiang
145
1
0
24 Nov 2025
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Kairong Luo
Zhenbo Sun
Haodong Wen
Xinyu Shi
Jiarui Cui
Chenyi Dang
Kaifeng Lyu
Wenguang Chen
143
1
0
24 Nov 2025
Doubly Wild Refitting: Model-Free Evaluation of High Dimensional Black-Box Predictions under Convex Losses
Haichen Hu
David Simchi-Levi
64
0
0
24 Nov 2025
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
Juncheng Li
Y. Li
Hanxun Huang
Yunhao Chen
Xin Wang
Yixu Wang
Xingjun Ma
Yu-Gang Jiang
MLLM
AAML
VLM
172
0
0
24 Nov 2025
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
Ziteng Sun
Adrian Benton
Samuel Kushnir
Asher Trockman
Vikas Singh
Suhas Diggavi
A. Suresh
MQ
122
0
0
24 Nov 2025
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Yang Liu
Xiaolong Zhong
Ling Jiang
LLMAG
MU
MoE
LRM
340
0
0
23 Nov 2025
SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators
Swastik Bhattacharya
Sanjay Das
Anand Menon
Shamik Kundu
Arnab Raha
K. Basu
8
0
0
23 Nov 2025
Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets
Gowtham
Sai Rupesh
Sanjay Kumar
Saravanan
Venkata Chaithanya
VLM
177
0
0
22 Nov 2025
1
2
3
4
...
88
89
90
Next