ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,486 papers shown
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Rahul Nadkarni
Yanai Elazar
Hila Gonen
Noah A. Smith
KELM
152
0
0
16 Oct 2025
NOSA: Native and Offloadable Sparse Attention
NOSA: Native and Offloadable Sparse Attention
Yuxiang Huang
Chaojun Xiao
Xu Han
Zhiyuan Liu
Zhou Su
...
Hengyu Zhao
Yudong Wang
Chaojun Xiao
Xu Han
Zhiyuan Liu
MQ
175
0
0
15 Oct 2025
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Xiaoshu Chen
Sihang Zhou
Ke Liang
Duanyang Yuan
Haoyuan Chen
Xiaoyu Sun
Linyuan Meng
Xinwang Liu
ReLMLRM
231
0
0
15 Oct 2025
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
Anna Hedström
Salim I. Amoukou
Tom Bewley
Saumitra Mishra
Manuela Veloso
LLMSV
219
2
0
15 Oct 2025
BioMedSearch: A Multi-Source Biomedical Retrieval Framework Based on LLMs
BioMedSearch: A Multi-Source Biomedical Retrieval Framework Based on LLMs
Congying Liu
Xingyuan Wei
Peipei Liu
Yiqing Shen
Yanxu Mao
Tiehan Cui
139
0
0
15 Oct 2025
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Xiaozhe Li
TianYi Lyu
Siyi Yang
Yuxi Gong
Yizhao Yang
Jinxuan Huang
Ligao Zhang
Zhuoyi Huang
Qingwen Liu
ELM
204
0
0
15 Oct 2025
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Mike Lasby
Ivan Lazarevich
Nish Sinnadurai
Sean Lie
Yani Andrew Ioannou
Vithursan Thangarasa
122
3
0
15 Oct 2025
GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models
GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models
Chen Zheng
Y. Cai
Deyi Liu
Jin Ma
Yiyuan Ma
Y. Yang
Jing Liu
Yutao Zeng
Xun Zhou
Siyuan Qiao
MoE
196
0
0
15 Oct 2025
Selective Adversarial Attacks on LLM Benchmarks
Selective Adversarial Attacks on LLM Benchmarks
Ivan Dubrovsky
Anastasia Orlova
Illarion Iov
Nina Gubina
Irena Gureeva
Alexey Zaytsev
AAML
122
0
0
15 Oct 2025
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Daniil Gurgurov
Josef van Genabith
Simon Ostermann
MoE
202
0
0
15 Oct 2025
End-to-End Multi-Modal Diffusion Mamba
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu
Qiang Lu
Meichen Dong
Jake Luo
141
3
0
15 Oct 2025
In-Distribution Steering: Balancing Control and Coherence in Language Model Generation
In-Distribution Steering: Balancing Control and Coherence in Language Model Generation
Arthur Vogels
Benjamin Wong
Yann Choho
A. Blangero
Milan Bhan
LLMSV
228
0
0
15 Oct 2025
PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features
PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features
Wei Zou
Yupei Liu
Yanting Wang
Ying Chen
Neil Zhenqiang Gong
Jinyuan Jia
AAML
212
0
0
15 Oct 2025
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
Kehua Feng
Keyan Ding
Zhihui Zhu
Lei Liang
Qiang Zhang
H. Chen
LRM
184
0
0
15 Oct 2025
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Shrey Pandit
Austin Xu
Xuan-Phi Nguyen
Yifei Ming
Caiming Xiong
Shafiq Joty
LRM
191
3
0
15 Oct 2025
Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps
Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps
Ahmed Alzubaidi
Shaikha Alsuwaidi
Basma El Amel Boussaha
Leen AlQadi
Omar Alkaabi
Mohammed Alyafeai
Hamza Alobeidli
Hakim Hacid
ELM
163
1
0
15 Oct 2025
RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems
RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems
Jingru Lin
Chen Zhang
Stephen Y. Liu
Haizhou Li
RALM
124
1
0
15 Oct 2025
Adaptive Reasoning Executor: A Collaborative Agent System for Efficient Reasoning
Adaptive Reasoning Executor: A Collaborative Agent System for Efficient Reasoning
Zehui Ling
Deshu Chen
Yichi Zhang
Yuchen Liu
Xigui Li
Xin Guo
Yuan Cheng
LLMAGLRM
110
0
0
15 Oct 2025
Tahakom LLM Guidelines and Recipes: From Pre-training Data to an Arabic LLM
Tahakom LLM Guidelines and Recipes: From Pre-training Data to an Arabic LLM
Areej AlOtaibi
Lina Alyahya
Raghad Alshabanah
Shahad Alfawzan
Shuruq Alarefei
...
Waad Alahmed
Omar Talabay
Jalal Alowibdi
Salem Alelyani
Adel Bibi
202
0
0
15 Oct 2025
Dr.LLM: Dynamic Layer Routing in LLMs
Dr.LLM: Dynamic Layer Routing in LLMs
Ahmed Heakl
Martin Gubri
Salman Khan
Sangdoo Yun
Seong Joon Oh
ReLM
378
1
1
14 Oct 2025
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Hancheng Ye
Zhengqi Gao
Mingyuan Ma
Qinsi Wang
Yuzhe Fu
...
Yueqian Lin
Zhijian Liu
Jianyi Zhang
Danyang Zhuo
Yiran Chen
VLM
169
1
0
14 Oct 2025
Evolution of meta's llama models and parameter-efficient fine-tuning of large language models: a survey
Evolution of meta's llama models and parameter-efficient fine-tuning of large language models: a survey
Abdulhady Abas Abdullah
Arkaitz Zubiaga
Seyedali Mirjalili
Amir Gandomi
Fatemeh Daneshfar
Mohammadsadra Amini
Alan Salam Mohammed
Hadi Veisi
ALM
193
0
0
14 Oct 2025
LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens
LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens
A. Zebaze
Rachel Bawden
Benoît Sagot
LRM
146
1
0
13 Oct 2025
ADVICE: Answer-Dependent Verbalized Confidence Estimation
ADVICE: Answer-Dependent Verbalized Confidence Estimation
Ki Jung Seo
Sehun Lim
Taeuk Kim
70
0
0
13 Oct 2025
Beyond Consensus: Mitigating the Agreeableness Bias in LLM Judge Evaluations
Beyond Consensus: Mitigating the Agreeableness Bias in LLM Judge Evaluations
Suryaansh Jain
Umair Z. Ahmed
Shubham Sahai
Ben Leong
99
2
0
13 Oct 2025
PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature
PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature
Daoyu Wang
Mingyue Cheng
Qi Liu
Shuo Yu
Zirui Liu
Ze Guo
Qi Liu
LRM
308
4
0
13 Oct 2025
LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance
LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance
Patrick Haller
Mark Ibrahim
Polina Kirichenko
Levent Sagun
Samuel J. Bell
KELM
135
1
0
13 Oct 2025
Neural Weight Compression for Language Models
Neural Weight Compression for Language Models
Jegwang Ryu
Minkyu Kim
Seungjun Shin
Hee Min Choi
Dokwan Oh
Jaeho Lee
140
0
0
13 Oct 2025
DND: Boosting Large Language Models with Dynamic Nested Depth
DND: Boosting Large Language Models with Dynamic Nested Depth
Tieyuan Chen
Xiaodong Chen
Haoxing Chen
Zhenzhong Lan
W. Lin
Jianguo Li
MoE
237
0
0
13 Oct 2025
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Jinchuan Tian
Sang-gil Lee
Zhifeng Kong
Sreyan Ghosh
Arushi Goel
...
Shinji Watanabe
Mohammad Shoeybi
Bryan Catanzaro
Rafael Valle
Wei Ping
AuLLMLRM
290
1
0
13 Oct 2025
Balancing Synthetic Data and Replay for Enhancing Task-Specific Capabilities
Balancing Synthetic Data and Replay for Enhancing Task-Specific Capabilities
Urs Spiegelhalter
Jorg K. H. Franke
Frank Hutter
CLLKELM
140
0
0
13 Oct 2025
LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models
LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models
Yiwei Liu
Y. Li
Xiao Li
Gong Cheng
LRM
75
1
0
13 Oct 2025
Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks
Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks
Wenya Xie
Qingying Xiao
Yu Zheng
Xidong Wang
Junying Chen
...
Anningzhe Gao
Prayag Tiwari
Xiang Wan
Feng Jiang
Benyou Wang
LM&MA
201
0
0
13 Oct 2025
MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models
MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models
Bo Cheng
Xu Wang
Jinda Liu
Yi-Ju Chang
Yuan Wu
MoEALM
176
0
0
13 Oct 2025
APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
Z. Li
Yuege Feng
Dandan Guo
Jinpeng Hu
Anningzhe Gao
Xiang Wan
127
2
0
13 Oct 2025
Harnessing Consistency for Robust Test-Time LLM Ensemble
Harnessing Consistency for Robust Test-Time LLM Ensemble
Zhichen Zeng
Qi Yu
Xiao Lin
Ruizhong Qiu
Xuying Ning
Tianxin Wei
Yuchen Yan
Jingrui He
Hanghang Tong
147
2
0
12 Oct 2025
RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
Zichun Yu
Chenyan Xiong
OnRL
236
0
0
12 Oct 2025
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication
Heng Zhang
Yuling Shi
Xiaodong Gu
Zijian Zhang
Haochen You
Lubin Gan
Yilei Yuan
Jin Huang
137
0
0
12 Oct 2025
D3MAS: Decompose, Deduce, and Distribute for Enhanced Knowledge Sharing in Multi-Agent Systems
D3MAS: Decompose, Deduce, and Distribute for Enhanced Knowledge Sharing in Multi-Agent Systems
Heng Zhang
Yuling Shi
Xiaodong Gu
Haochen You
Zijian Zhang
Lubin Gan
Yilei Yuan
Jin Huang
136
0
0
12 Oct 2025
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
Gunho Park
Jeongin Bae
Beomseok Kwon
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
201
1
0
12 Oct 2025
Trace Length is a Simple Uncertainty Signal in Reasoning Models
Trace Length is a Simple Uncertainty Signal in Reasoning Models
Siddartha Devic
Charlotte Peale
Arwen Bradley
Sinead Williamson
Preetum Nakkiran
Aravind Gollakota
LRM
148
1
0
12 Oct 2025
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision
Hongjie Zheng
Zesheng Shi
Ping Yi
132
0
0
12 Oct 2025
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
Shaobo Wang
C. Wang
Wenjie Fu
Yue Min
Mingquan Feng
...
Kexin Yang
Xingzhang Ren
Fei Huang
Dayiheng Liu
Linfeng Zhang
156
0
0
12 Oct 2025
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
Z. Chen
Yiming Zhang
Hengguang Zhou
Zenghui Ding
Yining Sun
Cho-Jui Hsieh
OffRLALMELM
118
0
0
12 Oct 2025
SASER: Stego attacks on open-source LLMs
SASER: Stego attacks on open-source LLMs
Ming Tan
Wei Li
Hu Tao
Hailong Ma
Aodi Liu
Qian Chen
Zilong Wang
AAML
171
0
0
12 Oct 2025
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
Guanbin Li
Miao Yu
Moayad Aloqaily
Zhenhong Zhou
Kun Wang
Linsey Pang
Prakhar Mehrotra
Qingsong Wen
AAML
81
0
0
11 Oct 2025
EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing
EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing
Sicheng Lyu
Yu Gu
Xinyu Wang
Jerry Huang
Sitao Luan
Yufei Cui
Xiao-Wen Chang
Peng Lu
KELM
85
0
0
11 Oct 2025
CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models
CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models
Zhuxuanzi Wang
Mingqiao Mo
Xi Xiao
Chen Liu
Chenrui Ma
Yunbei Zhang
Xiao Wang
Smita Krishnaswamy
Tianyang Wang
136
0
0
11 Oct 2025
PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
Manjiang Yu
Hongji Li
Priyanka Singh
X. Li
Di Wang
Lijie Hu
LLMSV
305
4
0
11 Oct 2025
MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
Hongwei Chen
Yishu Lei
Dan Zhang
Bo Ke
Danxiang Zhu
...
Shikun Feng
Jingzhou He
Yu Sun
Hua Wu
Haifeng Wang
ReLMLRM
140
0
0
11 Oct 2025
Previous
123...678...888990
Next
Page 7 of 90
Pageof 90