Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 4,428 papers shown
Title
AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization
Mengtao Lv
Ruiqi Zhu
Xinyu Wang
Y. Li
MQ
80
0
0
16 Oct 2025
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Rahul Nadkarni
Yanai Elazar
Hila Gonen
Noah A. Smith
KELM
104
0
0
16 Oct 2025
Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction
Xu Shen
Qi Zhang
Song Wang
Zhen Tan
Xinyu Zhao
...
Vaishnav Tadiparthi
Hossein Nourkhiz Mahjoub
Ehsan Moradi-Pari
Kwonjoon Lee
Tianlong Chen
109
0
0
16 Oct 2025
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
Yingpeng Ning
Yuanyuan Sun
Ling Luo
Yanhua Wang
Yuchen Pan
Hongfei Lin
HILM
180
0
0
16 Oct 2025
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
Samuel Paech
Allen Roush
Judah Goldfeder
Ravid Shwartz-Ziv
136
0
0
16 Oct 2025
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Kedi Chen
Zhikai Lei
Xu Guo
Xuecheng Wu
Siyuan Zeng
...
J. Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
AIMat
AI4TS
LRM
199
0
0
16 Oct 2025
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Daniil Gurgurov
Josef van Genabith
Simon Ostermann
MoE
140
0
0
15 Oct 2025
BioMedSearch: A Multi-Source Biomedical Retrieval Framework Based on LLMs
Congying Liu
Xingyuan Wei
Peipei Liu
Yiqing Shen
Yanxu Mao
Tiehan Cui
76
0
0
15 Oct 2025
In-Distribution Steering: Balancing Control and Coherence in Language Model Generation
Arthur Vogels
Benjamin Wong
Yann Choho
A. Blangero
Milan Bhan
LLMSV
173
0
0
15 Oct 2025
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Mike Lasby
Ivan Lazarevich
Nish Sinnadurai
Sean Lie
Yani Andrew Ioannou
Vithursan Thangarasa
60
0
0
15 Oct 2025
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
Kehua Feng
Keyan Ding
Zhihui Zhu
Lei Liang
Qiang Zhang
H. Chen
LRM
115
0
0
15 Oct 2025
Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps
Ahmed Alzubaidi
Shaikha Alsuwaidi
Basma El Amel Boussaha
Leen AlQadi
Omar Alkaabi
Mohammed Alyafeai
Hamza Alobeidli
Hakim Hacid
ELM
102
1
0
15 Oct 2025
NOSA: Native and Offloadable Sparse Attention
Yuxiang Huang
Chaojun Xiao
Xu Han
Zhiyuan Liu
MQ
128
0
0
15 Oct 2025
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Xiaozhe Li
TianYi Lyu
Siyi Yang
Yuxi Gong
Yizhao Yang
Jinxuan Huang
Ligao Zhang
Zhuoyi Huang
Qingwen Liu
ELM
131
0
0
15 Oct 2025
Selective Adversarial Attacks on LLM Benchmarks
Ivan Dubrovsky
Anastasia Orlova
Illarion Iov
Nina Gubina
Irena Gureeva
Alexey Zaytsev
AAML
76
0
0
15 Oct 2025
GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models
Chen Zheng
Y. Cai
Deyi Liu
Jin Ma
Yiyuan Ma
Y. Yang
Jing Liu
Yutao Zeng
Xun Zhou
Siyuan Qiao
MoE
112
0
0
15 Oct 2025
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Xiaoshu Chen
Sihang Zhou
Ke Liang
Duanyang Yuan
Haoyuan Chen
Xiaoyu Sun
Linyuan Meng
Xinwang Liu
ReLM
LRM
157
0
0
15 Oct 2025
RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems
Jingru Lin
Chen Zhang
Stephen Y. Liu
Haizhou Li
RALM
96
0
0
15 Oct 2025
PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features
Wei Zou
Yupei Liu
Yanting Wang
Ying Chen
Neil Zhenqiang Gong
Jinyuan Jia
AAML
154
0
0
15 Oct 2025
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
Anna Hedström
Salim I. Amoukou
Tom Bewley
Saumitra Mishra
Manuela Veloso
LLMSV
136
2
0
15 Oct 2025
Adaptive Reasoning Executor: A Collaborative Agent System for Efficient Reasoning
Zehui Ling
Deshu Chen
Yichi Zhang
Yuchen Liu
Xigui Li
Xin Guo
Yuan Cheng
LLMAG
LRM
60
0
0
15 Oct 2025
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Shrey Pandit
Austin Xu
Xuan-Phi Nguyen
Yifei Ming
Caiming Xiong
Shafiq Joty
LRM
96
2
0
15 Oct 2025
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu
Qiang Lu
Meichen Dong
Jake Luo
90
3
0
15 Oct 2025
Tahakom LLM Guidelines and Recipes: From Pre-training Data to an Arabic LLM
Areej AlOtaibi
Lina Alyahya
Raghad Alshabanah
Shahad Alfawzan
Shuruq Alarefei
...
Waad Alahmed
Omar Talabay
Jalal Alowibdi
Salem Alelyani
Adel Bibi
133
0
0
15 Oct 2025
Evolution of meta's llama models and parameter-efficient fine-tuning of large language models: a survey
Abdulhady Abas Abdullah
Arkaitz Zubiaga
Seyedali Mirjalili
Amir Gandomi
Fatemeh Daneshfar
Mohammadsadra Amini
Alan Salam Mohammed
Hadi Veisi
ALM
140
0
0
14 Oct 2025
Dr.LLM: Dynamic Layer Routing in LLMs
Ahmed Heakl
Martin Gubri
Salman Khan
Sangdoo Yun
Seong Joon Oh
ReLM
281
1
1
14 Oct 2025
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Hancheng Ye
Zhengqi Gao
Mingyuan Ma
Qinsi Wang
Yuzhe Fu
...
Yueqian Lin
Zhijian Liu
Jianyi Zhang
Danyang Zhuo
Yiran Chen
VLM
99
1
0
14 Oct 2025
Beyond Consensus: Mitigating the Agreeableness Bias in LLM Judge Evaluations
Suryaansh Jain
Umair Z. Ahmed
Shubham Sahai
Ben Leong
40
1
0
13 Oct 2025
LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance
Patrick Haller
Mark Ibrahim
Polina Kirichenko
Levent Sagun
Samuel J. Bell
KELM
66
0
0
13 Oct 2025
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Jinchuan Tian
Sang-gil Lee
Zhifeng Kong
Sreyan Ghosh
Arushi Goel
...
Shinji Watanabe
Mohammad Shoeybi
Bryan Catanzaro
Rafael Valle
Wei Ping
AuLLM
LRM
205
1
0
13 Oct 2025
Neural Weight Compression for Language Models
Jegwang Ryu
Minkyu Kim
Seungjun Shin
Hee Min Choi
Dokwan Oh
Jaeho Lee
80
0
0
13 Oct 2025
APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
Z. Li
Yuege Feng
Dandan Guo
Jinpeng Hu
Anningzhe Gao
Xiang Wan
76
0
0
13 Oct 2025
Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks
Wenya Xie
Qingying Xiao
Yu Zheng
Xidong Wang
Junying Chen
...
Anningzhe Gao
Prayag Tiwari
Xiang Wan
Feng Jiang
Benyou Wang
LM&MA
124
0
0
13 Oct 2025
MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models
Bo Cheng
Xu Wang
Jinda Liu
Yi-Ju Chang
Yuan Wu
MoE
ALM
116
0
0
13 Oct 2025
LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens
A. Zebaze
Rachel Bawden
Benoît Sagot
LRM
72
1
0
13 Oct 2025
PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature
Daoyu Wang
Mingyue Cheng
Qi Liu
Shuo Yu
Zirui Liu
Ze Guo
LRM
141
1
0
13 Oct 2025
ADVICE: Answer-Dependent Verbalized Confidence Estimation
Ki Jung Seo
Sehun Lim
Taeuk Kim
20
0
0
13 Oct 2025
Balancing Synthetic Data and Replay for Enhancing Task-Specific Capabilities
Urs Spiegelhalter
Jorg K. H. Franke
Frank Hutter
CLL
KELM
112
0
0
13 Oct 2025
LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models
Yiwei Liu
Y. Li
Xiao Li
Gong Cheng
LRM
48
0
0
13 Oct 2025
DND: Boosting Large Language Models with Dynamic Nested Depth
Tieyuan Chen
Xiaodong Chen
Haoxing Chen
Zhenzhong Lan
W. Lin
Jianguo Li
MoE
145
0
0
13 Oct 2025
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
Z. Chen
Yiming Zhang
Hengguang Zhou
Zenghui Ding
Yining Sun
Cho-Jui Hsieh
OffRL
ALM
ELM
73
0
0
12 Oct 2025
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
Shaobo Wang
C. Wang
Wenjie Fu
Yue Min
Mingquan Feng
...
Kexin Yang
Xingzhang Ren
Fei Huang
Dayiheng Liu
Linfeng Zhang
100
0
0
12 Oct 2025
D3MAS: Decompose, Deduce, and Distribute for Enhanced Knowledge Sharing in Multi-Agent Systems
Heng Zhang
Yuling Shi
Xiaodong Gu
Haochen You
Zijian Zhang
Lubin Gan
Yilei Yuan
Jin Huang
56
0
0
12 Oct 2025
RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
Zichun Yu
Chenyan Xiong
OnRL
156
0
0
12 Oct 2025
Trace Length is a Simple Uncertainty Signal in Reasoning Models
Siddartha Devic
Charlotte Peale
Arwen Bradley
Sinead Williamson
Preetum Nakkiran
Aravind Gollakota
LRM
104
0
0
12 Oct 2025
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
Gunho Park
Jeongin Bae
Beomseok Kwon
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
120
1
0
12 Oct 2025
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication
Heng Zhang
Yuling Shi
Xiaodong Gu
Zijian Zhang
Haochen You
Lubin Gan
Yilei Yuan
Jin Huang
64
0
0
12 Oct 2025
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision
Hongjie Zheng
Zesheng Shi
Ping Yi
62
0
0
12 Oct 2025
SASER: Stego attacks on open-source LLMs
Ming Tan
Wei Li
Hu Tao
Hailong Ma
Aodi Liu
Qian Chen
Zilong Wang
AAML
102
0
0
12 Oct 2025
Harnessing Consistency for Robust Test-Time LLM Ensemble
Zhichen Zeng
Qi Yu
Xiao Lin
Ruizhong Qiu
Xuying Ning
Tianxin Wei
Yuchen Yan
Jingrui He
Hanghang Tong
64
0
0
12 Oct 2025
Previous
1
2
3
...
5
6
7
...
87
88
89
Next