ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,486 papers shown
Investigating LLM Capabilities on Long Context Comprehension for Medical Question Answering
Investigating LLM Capabilities on Long Context Comprehension for Medical Question Answering
Feras AlMannaa
Talia Tseriotou
Jenny Chim
Maria Liakata
ELM
202
0
0
21 Oct 2025
How Do LLMs Use Their Depth?
How Do LLMs Use Their Depth?
Akshat Gupta
Jay Yeung
Gopala Anumanchipalli
Anna Ivanova
91
0
0
21 Oct 2025
Some Attention is All You Need for Retrieval
Some Attention is All You Need for Retrieval
Felix Michalak
Steven Abreu
97
0
0
21 Oct 2025
ECG-LLM-- training and evaluation of domain-specific large language models for electrocardiography
ECG-LLM-- training and evaluation of domain-specific large language models for electrocardiography
Lara Ahrens
Wilhelm Haverkamp
Nils Strodthoff
139
0
0
21 Oct 2025
The Free Transformer
The Free Transformer
François Fleuret
89
0
0
20 Oct 2025
MARS-M: When Variance Reduction Meets Matrices
MARS-M: When Variance Reduction Meets Matrices
Yifeng Liu
Angela Yuan
Q. Gu
230
1
0
20 Oct 2025
SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone
SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone
Nishant Subramani
Alfredo Gomez
Mona T. Diab
129
0
0
20 Oct 2025
DynaKV: Enabling Accurate and Efficient Long-Sequence LLM Decoding on Smartphones
DynaKV: Enabling Accurate and Efficient Long-Sequence LLM Decoding on Smartphones
Tuowei Wang
Minxing Huang
Fengzu Li
Ligeng Chen
Jinrui Zhang
Ju Ren
201
1
0
20 Oct 2025
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu
Joachim Baumann
Lorenzo Lupo
Nigel Collier
Dirk Hovy
Paul Röttger
ALM
349
7
0
20 Oct 2025
Annotation-Efficient Universal Honesty Alignment
Annotation-Efficient Universal Honesty Alignment
Shiyu Ni
Keping Bi
Jiafeng Guo
Minghao Tang
Jingtong Wu
Zengxin Han
Xueqi Cheng
HILM
158
1
0
20 Oct 2025
The Atomic Instruction Gap: Instruction-Tuned LLMs Struggle with Simple, Self-Contained Directives
The Atomic Instruction Gap: Instruction-Tuned LLMs Struggle with Simple, Self-Contained Directives
Henry Lim
Kwan Hui Lim
LRM
99
0
0
20 Oct 2025
AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI
AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI
Manik Rana
Calissa Man
Anotida Expected Msiiwa
Jeffrey Paine
Kevin Zhu
Sunishchal Dev
Vasu Sharma
Ahan M R
LLMAG
85
0
0
20 Oct 2025
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Jiawei Zhang
Andrew Estornell
David D. Baek
B. Li
Xiaojun Xu
158
0
0
20 Oct 2025
JT-Safe: Intrinsically Enhancing the Safety and Trustworthiness of LLMs
JT-Safe: Intrinsically Enhancing the Safety and Trustworthiness of LLMs
Junlan Feng
Fanyu Meng
Chong Long
Pengyu Cong
Duqing Wang
...
Z. Ren
Fan Yang
Na Wu
Di Jin
Chao Deng
HILM
189
0
0
20 Oct 2025
Measuring Reasoning in LLMs: a New Dialectical Angle
Measuring Reasoning in LLMs: a New Dialectical Angle
Soheil Abbasloo
LRM
141
0
0
20 Oct 2025
Evaluating Medical LLMs by Levels of Autonomy: A Survey Moving from Benchmarks to Applications
Evaluating Medical LLMs by Levels of Autonomy: A Survey Moving from Benchmarks to Applications
Xiao Ye
Jacob Dineen
Zhaonan Li
Zhikun Xu
Weiyu Chen
...
Ji-Eun Irene Yum
Muhammad Ali Khan
Muhammad Umar Afzal
Irbaz B. Riaz
Ben Zhou
LM&MAELM
197
1
0
20 Oct 2025
Mapping Post-Training Forgetting in Language Models at Scale
Mapping Post-Training Forgetting in Language Models at Scale
Jackson Harmon
Andreas Hochlehnert
Matthias Bethge
Ameya Prabhu
CLLKELM
159
0
0
20 Oct 2025
Vocab Diet: Reshaping the Vocabulary of LLMs with Vector Arithmetic
Vocab Diet: Reshaping the Vocabulary of LLMs with Vector Arithmetic
Yuval Reif
Guy Kaplan
Roy Schwartz
KELM
170
0
0
19 Oct 2025
Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
Kush Juvekar
Arghya Bhattacharya
Sai Khadloya
Utkarsh Saxena
AILawELM
196
1
0
19 Oct 2025
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
Chih-Kai Yang
Yen-Ting Piao
Tzu-wen Hsu
Szu-Wei Fu
Zhehuai Chen
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
KELMAuLLM
184
0
0
19 Oct 2025
DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking
DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking
Lanni Bu
Lauren Levin
Amir Zeldes
171
1
0
19 Oct 2025
Hierarchical Federated Unlearning for Large Language Models
Hierarchical Federated Unlearning for Large Language Models
Yisheng Zhong
Zhengbang Yang
Zhuangdi Zhu
MU
202
0
0
19 Oct 2025
Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games
Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games
Yikai Zhang
Ye Rong
Siyu Yuan
Jiangjie Chen
Jian Xie
Yanghua Xiao
LLMAGAAMLLRM
113
0
0
19 Oct 2025
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
Heming Zou
Yixiu Mao
Yun Qu
Qi Wang
Xiangyang Ji
183
1
0
19 Oct 2025
ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models
ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models
Emily Chang
Niyati Bafna
ELM
149
0
0
19 Oct 2025
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
Minhua Lin
Zongyu Wu
Zhichao Xu
Hui Liu
Xianfeng Tang
Qi He
Charu C. Aggarwal
Hui Liu
Xiang Zhang
Suhang Wang
AI4TSLRM
575
2
0
19 Oct 2025
EditMark: Watermarking Large Language Models based on Model Editing
EditMark: Watermarking Large Language Models based on Model Editing
Shuai Li
Kejiang Chen
Jun Jiang
Jie Zhang
Qiyi Yao
K. Zeng
W. Zhang
N. Yu
WaLMKELM
235
0
0
18 Oct 2025
MIN-Merging: Merge the Important Neurons for Model Merging
MIN-Merging: Merge the Important Neurons for Model Merging
Yunfei Liang
MoMe
556
0
0
18 Oct 2025
When Models Can't Follow: Testing Instruction Adherence Across 256 LLMs
When Models Can't Follow: Testing Instruction Adherence Across 256 LLMs
Richard J. Young
Brandon Gillins
Alice M. Matthews
ALMELM
165
2
0
18 Oct 2025
From Characters to Tokens: Dynamic Grouping with Hierarchical BPE
From Characters to Tokens: Dynamic Grouping with Hierarchical BPE
Rares Dolga
Lucas Maystre
Tudor Berariu
David Barber
117
0
0
17 Oct 2025
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Fei Wang
Li Shen
Liang Ding
Chao Xue
Ye Liu
Changxing Ding
171
0
0
17 Oct 2025
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Dung V. Nguyen
Anh T. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
Shiqi Jiang
Ethan Fetaya
Linh Duy Tran
Gal Chechik
T. Nguyen
MoMe
193
1
0
17 Oct 2025
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
Huining Yuan
Zelai Xu
Zheyue Tan
Xiangmin Yi
Mo Guang
...
Xinlei Chen
Bo Zhao
Xiao-Ping Zhang
Chao Yu
Yu Wang
LLMAGLRM
139
0
0
17 Oct 2025
Rethinking Cross-lingual Gaps from a Statistical Viewpoint
Rethinking Cross-lingual Gaps from a Statistical Viewpoint
Vihari Piratla
Purvam Jain
Darshan Singh
Partha Talukdar
Trevor Cohn
112
0
0
17 Oct 2025
Demo: Guide-RAG: Evidence-Driven Corpus Curation for Retrieval-Augmented Generation in Long COVID
Demo: Guide-RAG: Evidence-Driven Corpus Curation for Retrieval-Augmented Generation in Long COVID
Philip DiGiacomo
Haoyang Wang
Jinrui Fang
Yan Leng
W Michael Brode
Ying Ding
117
0
0
17 Oct 2025
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
Heecheol Yun
Kwangmin Ki
J. H. Lee
Eunho Yang
152
0
0
17 Oct 2025
HypoSpace: Evaluating LLM Creativity as Set-Valued Hypothesis Generators under Underdetermination
HypoSpace: Evaluating LLM Creativity as Set-Valued Hypothesis Generators under Underdetermination
Tingting Chen
Beibei Lin
Zifeng Yuan
Qiran Zou
Hongyu Hè
Yew-Soon Ong
Anirudh Goyal
Dianbo Liu
87
1
0
17 Oct 2025
SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection
SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection
Yang Feng
Xudong Pan
AAML
99
1
0
17 Oct 2025
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
Lina Berrayana
Ahmed Heakl
Muhammad Abdullah Sohail
Thomas Hofmann
Salman Khan
Wei Chen
185
1
0
17 Oct 2025
KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models
KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models
Dongjun Kim
Chanhee Park
Chanjun Park
Heuiseok Lim
ALMELM
154
0
0
17 Oct 2025
LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation
LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation
Gao Yang
Yuhang Liu
Siyu Miao
Xinyue Liang
Zhengyang Liu
Heyan Huang
134
0
0
17 Oct 2025
Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction
Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction
Xu Shen
Qi Zhang
Song Wang
Zhen Tan
Xinyu Zhao
...
Vaishnav Tadiparthi
Hossein Nourkhiz Mahjoub
Ehsan Moradi-Pari
Kwonjoon Lee
Tianlong Chen
230
1
0
16 Oct 2025
Model-agnostic Selective Labeling with Provable Statistical Guarantees
Model-agnostic Selective Labeling with Provable Statistical Guarantees
Huipeng Huang
Wenbo Liao
Huajun Xi
Hao Zeng
Mengchen Zhao
Hongxin Wei
142
1
0
16 Oct 2025
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Xukai Wang
Xuanbo Liu
Mingrui Chen
Haitian Zhong
Xuanlin Yang
...
Xu-Yao Zhang
Qiang Liu
Zhouchen Lin
Wentao Zhang
Bin Dong
ELMLRM
171
1
0
16 Oct 2025
Finding Answers in Thought Matters: Revisiting Evaluation on Large Language Models with Reasoning
Finding Answers in Thought Matters: Revisiting Evaluation on Large Language Models with Reasoning
Hwiyeol Jo
Joosung Lee
J. H. Lee
Sang-Woo Lee
Joonsuk Park
Kang Min Yoo
ReLMLRM
122
2
0
16 Oct 2025
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Kedi Chen
Zhikai Lei
Xu Guo
Xuecheng Wu
Siyuan Zeng
...
J. Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
AIMatAI4TSLRM
334
0
0
16 Oct 2025
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
Yingpeng Ning
Yuanyuan Sun
Ling Luo
Yanhua Wang
Yuchen Pan
Hongfei Lin
HILM
269
1
0
16 Oct 2025
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Rahul Nadkarni
Yanai Elazar
Hila Gonen
Noah A. Smith
KELM
152
0
0
16 Oct 2025
AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization
AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization
Mengtao Lv
Ruiqi Zhu
Xinyu Wang
Y. Li
MQ
158
0
0
16 Oct 2025
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
Samuel Paech
Allen Roush
Judah Goldfeder
Ravid Shwartz-Ziv
227
0
0
16 Oct 2025
Previous
123...567...888990
Next
Page 6 of 90
Pageof 90