ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,479 papers shown
Title
Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks
Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks
Eileen Pan
A. S. G. Choi
Maartje ter Hoeve
Skyler Seto
Allison Koenecke
68
0
0
01 Oct 2025
The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
Thiziri Nait Saada
Louis Béthune
Michal Klein
David Grangier
Marco Cuturi
Pierre Ablin
134
1
0
01 Oct 2025
PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
Yujia Xiao
Liumeng Xue
Lei He
Xinyi Chen
Aemon Yat Fei Chiu
...
Shaofei Zhang
Qiuqiang Kong
Xinfa Zhu
Wei Xue
Tan Lee
AuLLMVGen
141
0
0
01 Oct 2025
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
Yinyi Luo
Z. Zhou
Hao Chen
Kai Qiu
Marios Savvides
Shouqing Yang
James Evans
KELMMU
176
0
0
01 Oct 2025
When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
Shree Harsha Bokkahalli Satish
G. Henter
Éva Székely
130
1
0
01 Oct 2025
Uncovering the Computational Ingredients of Human-Like Representations in LLMs
Uncovering the Computational Ingredients of Human-Like Representations in LLMs
Zach Studdiford
Timothy T. Rogers
Kushin Mukherjee
Siddharth Suresh
152
0
0
01 Oct 2025
A-VERT: Agnostic Verification with Embedding Ranking Targets
A-VERT: Agnostic Verification with Embedding Ranking Targets
Nicolás Aguirre
Ramiro Caso
Ramiro Rodríguez Colmeiro
Mauro Santelli
Joaquín Toranzo Calderón
116
0
0
01 Oct 2025
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
David Anugraha
Shou-Yi Hung
Zilu Tang
Annie En-Shiun Lee
Derry Wijaya
Genta Indra Winata
LRM
429
2
0
01 Oct 2025
Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare
Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare
Zhengliang Shi
Ruotian Ma
Jen-tse Huang
Xinbei Ma
Xingyu Chen
...
Wenxuan Wang
Zhaopeng Tu
Xiaolong Li
Zhaochun Ren
Linus
LLMAG
350
0
0
01 Oct 2025
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
Yicheng Lang
Yihua Zhang
Chongyu Fan
Changsheng Wang
Jinghan Jia
Sijia Liu
MU
345
0
0
01 Oct 2025
Learning Compact Representations of LLM Abilities via Item Response Theory
Learning Compact Representations of LLM Abilities via Item Response Theory
Jianhao Chen
Chenxu Wang
G. Zhang
Peng Ye
Lei Bai
Wei Hu
Yuzhong Qu
Shuyue Hu
93
0
0
01 Oct 2025
Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information
Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information
Rui Ai
Yuqi Pan
David Simchi-Levi
Milind Tambe
Haifeng Xu
120
2
0
01 Oct 2025
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
Chen-An Li
Tzu-Han Lin
Hung-yi Lee
AuLLM
140
0
0
01 Oct 2025
Hearing the Order: Investigating Selection Bias in Large Audio-Language Models
Hearing the Order: Investigating Selection Bias in Large Audio-Language Models
Yu-Xiang Lin
Chen-An Li
Sheng-Lun Wei
Po-Chun Chen
Hsin-Hsi Chen
Hung-yi Lee
120
0
0
01 Oct 2025
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
Xudong Zhu
Mohammad Mahdi Khalili
Zhihui Zhu
232
0
0
01 Oct 2025
GRAD: Generative Retrieval-Aligned Demonstration Sampler for Efficient Few-Shot Reasoning
GRAD: Generative Retrieval-Aligned Demonstration Sampler for Efficient Few-Shot Reasoning
Oussama Gabouj
Kamel Charaf
Ivan Zakazov
Nicolas Mario Baldwin
Robert West
LLMAGRALMLRM
84
0
0
01 Oct 2025
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Xingjian Zhao
Zhe Xu
Qinyuan Cheng
Zhaoye Fei
Luozhijie Jin
...
Yitian Gong
Yuanfan Xu
Yaqian Zhou
Xuanjing Huang
Xipeng Qiu
AuLLM
242
2
0
01 Oct 2025
LLM Routing with Dueling Feedback
LLM Routing with Dueling Feedback
Chao-Kai Chiang
Takashi Ishida
Masashi Sugiyama
112
0
0
01 Oct 2025
The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation
The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation
Zarreen Reza
LLMAG
60
1
0
01 Oct 2025
The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks
The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks
Arda Uzunoglu
Tianjian Li
Daniel Khashabi
162
0
0
30 Sep 2025
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Jingdi Lei
Varun Gumma
Rishabh Bhardwaj
Seok Min Lim
Chuan Li
Amir Zadeh
Soujanya Poria
LLMAGALMELM
215
0
0
30 Sep 2025
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
Weiyu Huang
Yuezhou Hu
Jun Zhu
Jianfei Chen
CLL
100
0
0
30 Sep 2025
Unspoken Hints: Accuracy Without Acknowledgement in LLM Reasoning
Unspoken Hints: Accuracy Without Acknowledgement in LLM Reasoning
Arash Marioriyad
Shaygan Adim
Nima Alighardashi
Mahdieh Soleymani Banghshah
M. Rohban
LRM
79
1
0
30 Sep 2025
Feedback Forensics: A Toolkit to Measure AI Personality
Feedback Forensics: A Toolkit to Measure AI Personality
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Robert Mullins
117
0
0
30 Sep 2025
Towards Ecologically Valid LLM Benchmarks: Understanding and Designing Domain-Centered Evaluations for Journalism Practitioners
Towards Ecologically Valid LLM Benchmarks: Understanding and Designing Domain-Centered Evaluations for Journalism Practitioners
Charlotte Li
Nick Hagar
Sachita Nishal
Jeremy Gilbert
Nick Diakopoulos
89
0
0
30 Sep 2025
Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions
Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions
Junbeom Kim
Kyuyoung Kim
Jihoon Tack
Dongha Lim
Jinwoo Shin
MUKELM
137
1
0
30 Sep 2025
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation
Miao Rang
Zhenni Bi
Hang Zhou
Hanting Chen
An Xiao
Tianyu Guo
Kai Han
Xinghao Chen
Yunhe Wang
141
1
0
30 Sep 2025
Collaborative Compression for Large-Scale MoE Deployment on Edge
Collaborative Compression for Large-Scale MoE Deployment on Edge
Yixiao Chen
Yanyue Xie
Ruining Yang
Wei Jiang
Wei Wang
Yong He
Yue Chen
Pu Zhao
Y. Wang
MQ
84
0
0
30 Sep 2025
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
Yein Park
Jungwoo Park
Jaewoo Kang
156
0
0
30 Sep 2025
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Yang Tang
Ruijie Liu
Yifan Wang
Shiyu Li
Xi Chen
98
0
0
30 Sep 2025
RL-Guided Data Selection for Language Model Finetuning
RL-Guided Data Selection for Language Model Finetuning
Animesh Jha
Harshit Gupta
Ananjan Nandi
OffRL
244
0
0
30 Sep 2025
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Yaoxiang Wang
Qingguo Hu
Yucheng Ding
Ruizhe Wang
Yeyun Gong
Jian Jiao
Yelong Shen
Peng Cheng
Jinsong Su
MoE
68
1
0
30 Sep 2025
RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models
RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models
Dragos-Dumitru Ghinea
Adela-Nicoleta Corbeanu
Adrian-Marius Dumitran
96
0
0
30 Sep 2025
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Minhui Zhu
Minyang Tian
Xiaocheng Yang
Tianci Zhou
Lifan Yuan
...
Ruixing Zhang
X. Wang
Ofir Press
Nicolas Chia
Eliu A. Huerta
LRMELM
114
2
0
30 Sep 2025
TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning
TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning
Seohyun Lee
Wenzhi Fang
Dong-Jun Han
Seyyedali Hosseinalipour
Christopher G. Brinton
112
0
0
30 Sep 2025
AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations
AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations
Berdymyrat Ovezmyradov
76
0
0
30 Sep 2025
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
Mingjin Li
Yu Liu
Huayi Liu
Xiang Ye
Chao Jiang
Hongguang Zhang
Yu Ruan
200
2
0
30 Sep 2025
Nudging the Boundaries of LLM Reasoning
Nudging the Boundaries of LLM Reasoning
Justin Chih-Yao Chen
Becky Xiangyu Peng
Prafulla Kumar Choubey
Kung-Hsiang Huang
Jiaxin Zhang
Mohit Bansal
Chien-Sheng Wu
LRM
136
1
0
30 Sep 2025
Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems
Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems
Aakriti Agrawal
R. Aralikatti
Anirudh Satheesh
Souradip Chakraborty
Amrit Singh Bedi
Furong Huang
LRM
116
1
0
30 Sep 2025
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
Huu Nguyen
Victor May
Harsh Raj
Marianna Nezhurina
Yishan Wang
...
Aleksandra Krasnodębska
Christoph Schuhmann
Mats Leon Richter
Xuan-Son
J. Jitsev
203
1
0
29 Sep 2025
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
Tianrui Qin
Qianben Chen
S. Wang
He Xing
King Zhu
...
G. Zhang
Jiaheng Liu
Yuchen Eleanor Jiang
Xitong Gao
Wangchunshu Zhou
LLMAGLRM
159
5
0
29 Sep 2025
Knowledge Editing with Subspace-Aware Key-Value Mappings
Knowledge Editing with Subspace-Aware Key-Value Mappings
Haewon Park
Sangwoo Kim
Yohan Jo
KELM
280
0
0
29 Sep 2025
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing
Rana Shahout
Colin Cai
Yilun Du
Minlan Yu
Michael Mitzenmacher
MoEMoMe
151
3
0
29 Sep 2025
Expanding Computation Spaces of LLMs at Inference Time
Expanding Computation Spaces of LLMs at Inference Time
Yoonna Jang
Kisu Yang
Isabelle Augenstein
LLMAGReLMLRM
68
0
0
29 Sep 2025
Query Circuits: Explaining How Language Models Answer User Prompts
Query Circuits: Explaining How Language Models Answer User Prompts
Tung-Yu Wu
Fazl Barez
ReLMLRM
137
0
0
29 Sep 2025
Fingerprinting LLMs via Prompt Injection
Fingerprinting LLMs via Prompt Injection
Yuepeng Hu
Zhengyuan Jiang
Mengyuan Li
Osama Ahmed
Zhicong Huang
Cheng Hong
Neil Zhenqiang Gong
178
0
0
29 Sep 2025
Mechanisms of Matter: Language Inferential Benchmark on Physicochemical Hypothesis in Materials Synthesis
Mechanisms of Matter: Language Inferential Benchmark on Physicochemical Hypothesis in Materials Synthesis
Yingming Pu
Tao Lin
Hongyu Chen
145
0
0
29 Sep 2025
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Jitai Hao
Hao Liu
Xinyan Xiao
Qiang Huang
Jun Yu
204
0
0
29 Sep 2025
Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
M. R
Dan John Velasco
89
0
0
29 Sep 2025
Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns
Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns
Hanqi Xiao
Vaidehi Patil
Hyunji Lee
Elias Stengel-Eskin
Mohit Bansal
164
1
0
29 Sep 2025
Previous
123...91011...888990
Next