ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,430 papers shown
Title
Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
Dongjun Kim
Gyuho Shim
YongChan Chun
Minhyuk Kim
Chanjun Park
Heuiseok Lim
100
1
0
23 Sep 2025
Experience Scaling: Post-Deployment Evolution For Large Language Models
Experience Scaling: Post-Deployment Evolution For Large Language Models
Xingkun Yin
Kaibin Huang
Dong In Kim
Hongyang Du
84
0
0
23 Sep 2025
Soft Tokens, Hard Truths
Soft Tokens, Hard Truths
Natasha Butt
Ariel Kwiatkowski
Ismail Labiad
Julia Kempe
Yann Ollivier
OffRLCLLLRM
103
1
0
23 Sep 2025
What Does Your Benchmark Really Measure? A Framework for Robust Inference of AI Capabilities
What Does Your Benchmark Really Measure? A Framework for Robust Inference of AI Capabilities
Nathanael Jo
Ashia Wilson
ELM
110
0
0
23 Sep 2025
ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities
ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities
Aleksis Datseris
Sylvia Vassileva
Ivan Koychev
Svetla Boytcheva
56
0
0
23 Sep 2025
Reinforcement Learning on Pre-Training Data
Reinforcement Learning on Pre-Training Data
Siheng Li
Kejiao Li
Zenan Xu
Guanhua Huang
Evander Yang
...
Jianchen Zhu
W. Lam
Wayyt Wang
Bo Zhou
Di Wang
OffRLLRM
122
2
0
23 Sep 2025
AgentInit: Initializing LLM-based Multi-Agent Systems via Diversity and Expertise Orchestration for Effective and Efficient Collaboration
AgentInit: Initializing LLM-based Multi-Agent Systems via Diversity and Expertise Orchestration for Effective and Efficient Collaboration
Chunhao Tian
Yutong Wang
Xuebo Liu
Zhexuan Wang
Liang Ding
Miao Zhang
Min Zhang
AI4CE
110
0
0
23 Sep 2025
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
Yingming Zheng
Hanqi Li
Kai Yu
Lu Chen
193
0
0
23 Sep 2025
MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents
MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents
Yuzhen Lei
Hongbin Xie
Jiaxing Zhao
Shuangxue Liu
Xuan Song
LRM
45
0
0
22 Sep 2025
Evaluating the Safety and Skill Reasoning of Large Reasoning Models Under Compute Constraints
Evaluating the Safety and Skill Reasoning of Large Reasoning Models Under Compute Constraints
Adarsha Balaji
Le Chen
R. Thakur
Franck Cappello
Sandeep Madireddy
LRM
60
0
0
22 Sep 2025
Probabilistic Token Alignment for Large Language Model Fusion
Probabilistic Token Alignment for Large Language Model Fusion
Runjia Zeng
James Liang
Cheng Han
Zhiwen Cao
Jiahao Liu
...
Yingjie Victor Chen
Lifu Huang
Tong Geng
Qifan Wang
Dongfang Liu
108
1
0
21 Sep 2025
seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs
seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs
Mohammad Ramezanali
Mo Vazifeh
Paolo Santi
LRMELM
56
0
0
21 Sep 2025
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models
He Xiao
Runming Yang
Qingyao Yang
Wendong Xu
Zheng Li
Yupeng Su
Zhengwu Liu
Hongxia Yang
Ngai Wong
MQ
64
0
0
21 Sep 2025
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
Mohammad Beigi
Ying Shen
Parshin Shojaee
Qifan Wang
Zichao Wang
Chandan K. Reddy
Ming Jin
Lifu Huang
LRM
62
0
0
20 Sep 2025
Challenging the Evaluator: LLM Sycophancy Under User Rebuttal
Challenging the Evaluator: LLM Sycophancy Under User Rebuttal
Sungwon Kim
Daniel Khashabi
ELM
94
0
0
20 Sep 2025
Can an Individual Manipulate the Collective Decisions of Multi-Agents?
Can an Individual Manipulate the Collective Decisions of Multi-Agents?
Fengyuan Liu
Rui Zhao
Shuo Chen
Guohao Li
Juil Sock
Lei Han
Jindong Gu
AAMLLLMAG
159
1
0
20 Sep 2025
LLMsPark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts
LLMsPark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts
Junhao Chen
Jingbo Sun
Xiang Li
Haidong Xin
Yuhao Xue
Yibin Xu
Hao Zhao
LLMAGELMLRM
140
0
0
20 Sep 2025
Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle
Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle
Keliang Liu
Dingkang Yang
Ziyun Qian
Weijie Yin
Y. Wang
Hongsheng Li
Jun Liu
Peng Zhai
Y. Liu
Lihua Zhang
OffRLLRM
162
5
0
20 Sep 2025
GPO: Learning from Critical Steps to Improve LLM Reasoning
GPO: Learning from Critical Steps to Improve LLM Reasoning
Jiahao Yu
Zelei Cheng
Xian Wu
Xinyu Xing
LRM
143
2
0
19 Sep 2025
DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning
DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning
Sikai Bai
Haoxi Li
Jie Zhang
Zicong Hong
Song Guo
MoE
70
1
0
19 Sep 2025
Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research
Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research
Richard Diehl Martinez
David Demitri Africa
Yuval Weiss
Suchir Salhan
Ryan Daniels
P. Buttery
100
1
0
19 Sep 2025
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models
Stephen Fitz
P. Romero
Steven Basart
Sipeng Chen
Jose Hernandez-Orallo
96
1
0
19 Sep 2025
RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering
RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering
Weikang Qiu
Tinglin Huang
Ryan Rullo
Yucheng Kuang
Ali Maatouk
S. Raquel Ramos
Rex Ying
LM&MAAI4MHELM
269
0
0
19 Sep 2025
SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection
SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection
Maithili Joshi
Palash Nandi
Tanmoy Chakraborty
AAMLLLMSV
56
0
0
19 Sep 2025
Robust LLM Training Infrastructure at ByteDance
Robust LLM Training Infrastructure at ByteDanceSymposium on Operating Systems Principles (SOSP), 2025
Borui Wan
Gaohong Liu
Zuquan Song
Jun Wang
Yun-feng Zhang
...
Yanghua Peng
H. Lin
W. L. Xiao
Xin Liu
Liang Xiang
274
3
0
19 Sep 2025
Rationality Check! Benchmarking the Rationality of Large Language Models
Rationality Check! Benchmarking the Rationality of Large Language Models
Zhilun Zhou
Jing Yi Wang
Nicholas Sukiennik
Chen Gao
Fengli Xu
Yong Li
James Evans
LRM
96
0
0
18 Sep 2025
ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance
ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance
Hannah Sterz
Fabian David Schmidt
Goran Glavaš
Ivan Vulić
MoMeLLMSV
124
1
0
18 Sep 2025
Enhancing Retrieval Augmentation via Adversarial Collaboration
Enhancing Retrieval Augmentation via Adversarial Collaboration
Letian Zhang
G. MEng
Xudong Ren
Yiming Wang
Shu-Tao Xia
73
1
0
18 Sep 2025
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Yeongbin Seo
Dongha Lee
Jaehyung Kim
Jinyoung Yeo
143
0
0
18 Sep 2025
Quantifying Self-Awareness of Knowledge in Large Language Models
Quantifying Self-Awareness of Knowledge in Large Language Models
Yeongbin Seo
Dongha Lee
Jinyoung Yeo
HILM
76
0
0
18 Sep 2025
KAIO: A Collection of More Challenging Korean Questions
KAIO: A Collection of More Challenging Korean Questions
Nahyun Lee
Guijin Son
Hyunwoo Ko
Kyubeen Han
ELMVLM
64
1
0
18 Sep 2025
The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior
The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior
Angelina Wang
James Grimmelmann
Sanmi Koyejo
OffRL
151
1
0
18 Sep 2025
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages
Yujia Hu
Ming Shan Hee
Preslav Nakov
Roy Ka-wei Lee
111
0
0
18 Sep 2025
MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models
MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models
Siyu Yan
Long Zeng
Xuecheng Wu
Chengcheng Han
Kongcheng Zhang
Chong Peng
Xuezhi Cao
Xunliang Cai
Chenjuan Guo
AAML
98
0
0
18 Sep 2025
Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction
Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction
Yuanbo Xie
Yingjie Zhang
Tianyun Liu
Duohe Ma
Tingwen Liu
AAML
103
1
0
18 Sep 2025
Do Large Language Models Understand Word Senses?
Do Large Language Models Understand Word Senses?
Domenico Meconi
Simone Stirpe
Federico Martelli
Leonardo Lavalle
Roberto Navigli
76
0
0
17 Sep 2025
Enhancing Multi-Agent Debate System Performance via Confidence Expression
Enhancing Multi-Agent Debate System Performance via Confidence Expression
Zijie Lin
Bryan Hooi
LLMAG
91
1
0
17 Sep 2025
GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing
GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing
Silan Hu
Shiqi Zhang
Yimin Shi
Xiaokui Xiao
88
1
0
17 Sep 2025
Synthetic bootstrapped pretraining
Synthetic bootstrapped pretraining
Zitong Yang
Aonan Zhang
Hong Liu
Tatsunori Hashimoto
Emmanuel Candès
Chong-Jun Wang
Ruoming Pang
SyDa
139
0
0
17 Sep 2025
DSFT: Inspiring Diffusion Large Language Models to Comprehend Mathematical and Logical Patterns
DSFT: Inspiring Diffusion Large Language Models to Comprehend Mathematical and Logical Patterns
Ranfei Chen
Ming Chen
DiffMAI4CE
53
0
0
17 Sep 2025
ZERA: Zero-init Instruction Evolving Refinement Agent - From Zero Instructions to Structured Prompts via Principle-based Optimization
ZERA: Zero-init Instruction Evolving Refinement Agent - From Zero Instructions to Structured Prompts via Principle-based Optimization
Seungyoun Yi
Minsoo Khang
Sungrae Park
LLMAG
44
0
0
17 Sep 2025
SAIL-VL2 Technical Report
SAIL-VL2 Technical Report
Weijie Yin
Yongjie Ye
Fangxun Shu
Yue Liao
Zijian Kang
...
Han Wang
Wenzhuo Liu
Xiao Liang
Shuicheng Yan
Chao Feng
LRMVLM
240
2
0
17 Sep 2025
Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum Learning
Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Yangning Li
Tingwei Lu
Yinghui Li
Yankai Chen
Wei-Chieh Huang
Wenhao Jiang
Hui Wang
Hai-Tao Zheng
Philip S.Yu
182
0
0
17 Sep 2025
Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning
Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning
Bo Yin
Xingyi Yang
Xinchao Wang
85
1
0
16 Sep 2025
The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
Jeremias Lino Ferrao
Matthijs van der Lende
Ilija Lichkovski
Clement Neo
LLMSV
192
0
0
16 Sep 2025
Towards mitigating information leakage when evaluating safety monitors
Towards mitigating information leakage when evaluating safety monitors
Gerard Boxo
Aman Neelappa
Shivam Raval
AAML
96
0
0
16 Sep 2025
Bhaasha, Bhasa, Zaban: A Survey for Low-Resourced Languages in South Asia - Current Stage and Challenges
Bhaasha, Bhasa, Zaban: A Survey for Low-Resourced Languages in South Asia - Current Stage and Challenges
Sampoorna Poria
Xiaolei Huang
160
0
0
15 Sep 2025
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
Sangjun Lee
Seung-taek Woo
Jungyu Jin
Changhun Lee
Eunhyeok Park
MQ
81
2
0
15 Sep 2025
CBP-Tuning: Efficient Local Customization for Black-box Large Language Models
CBP-Tuning: Efficient Local Customization for Black-box Large Language Models
Jiaxuan Zhao
Naibin Gu
Yuchen Feng
Xiyu Liu
Peng Fu
Zheng Lin
Weiping Wang
56
0
0
15 Sep 2025
Preservation of Language Understanding Capabilities in Speech-aware Large Language Models
Preservation of Language Understanding Capabilities in Speech-aware Large Language Models
Marek Kubis
Paweł Skórzewski
Iwona Christop
Mateusz Czyżnikiewicz
Jakub Kubiak
Łukasz Bondaruk
Marcin Lewandowski
AuLLMELM
142
0
0
15 Sep 2025
Previous
123...111213...878889
Next