ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,486 papers shown
Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead
Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead
Tom Sühr
Florian E. Dorner
Olawale Salaudeen
Augustin Kelava
Samira Samadi
ALMELM
169
2
0
30 Jul 2025
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
Kwesi Cobbina
Tianyi Zhou
132
4
0
30 Jul 2025
CUS-QA: Local-Knowledge-Oriented Open-Ended Question Answering Dataset
CUS-QA: Local-Knowledge-Oriented Open-Ended Question Answering Dataset
Jindrich Libovický
Jindřich Helcl
Andrei-Alexandru Manea
Gianluca Vico
191
2
0
30 Jul 2025
BALSAM: A Platform for Benchmarking Arabic Large Language Models
BALSAM: A Platform for Benchmarking Arabic Large Language Models
Rawan N. Al-Matham
Kareem Darwish
Raghad Al-Rasheed
Waad Alshammari
Muneera Alhoshan
...
Sultana Alghurabi
Atikah Alzeghayer
Afrah Altamimi
Abdullah Alfaifi
Abdulrahman AlOsaimy
ELM
224
2
0
30 Jul 2025
AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models
AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models
Lian Yan
Haotian Wang
Chen Tang
Haifeng Liu
Tianyang Sun
Liangliang Liu
Y. Guan
Jingchi Jiang
ELM
169
2
0
29 Jul 2025
Improving Task Diversity in Label Efficient Supervised Finetuning of LLMs
Improving Task Diversity in Label Efficient Supervised Finetuning of LLMs
Abhinav Arabelly
Jagrut Nemade
R. Nowak
Jifan Zhang
ALM
204
0
0
29 Jul 2025
Strategic Deflection: Defending LLMs from Logit Manipulation
Strategic Deflection: Defending LLMs from Logit Manipulation
Yassine Rachidy
Jihad Rbaiti
Youssef Hmamouche
Faissal Sehbaoui
Amal El Fallah Seghrouchni
AAMLLLMSV
157
1
0
29 Jul 2025
Training language models to be warm and empathetic makes them less reliable and more sycophantic
Training language models to be warm and empathetic makes them less reliable and more sycophantic
Lujain Ibrahim
Franziska Sofia Hafner
Luc Rocher
231
11
0
29 Jul 2025
Evaluation and Benchmarking of LLM Agents: A Survey
Evaluation and Benchmarking of LLM Agents: A Survey
Mahmoud Mohammadi
Yipeng Li
Jane Lo
Wendy Yip
LLMAGELM
421
36
0
29 Jul 2025
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
Yanxu Zhu
Shitong Duan
Xiangxu Zhang
Jitao Sang
Peng Zhang
Tun Lu
Xiao Zhou
Jing Yao
Xiaoyuan Yi
Xing Xie
179
0
0
29 Jul 2025
ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge
ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge
Zihan Zhao
B. Chen
Ziping Wan
Lu Chen
Xuanze Lin
...
Huayang Wang
Zhongyang Dai
Liyang Wen
Xin Chen
Kai Yu
LRMAI4CE
175
4
0
29 Jul 2025
Dissecting Persona-Driven Reasoning in Language Models via Activation Patching
Dissecting Persona-Driven Reasoning in Language Models via Activation Patching
Ansh Poonia
Maeghal Jain
227
0
0
28 Jul 2025
Kimi K2: Open Agentic Intelligence
Kimi K2: Open Agentic Intelligence
Kimi Team
Yifan Bai
Yiping Bao
Guanduo Chen
Jiahao Chen
...
Qifeng Teng
Chensi Wang
Dinglu Wang
Feng Wang
Haiming Wang
MoEVLMLRM
188
84
0
28 Jul 2025
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
Yining Huang
Bin Li
Keke Tang
Meilian Chen
MoELRM
255
2
0
28 Jul 2025
Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model Distribution
Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model Distribution
Zhicheng Zhang
Peizhuo Lv
Mengke Wan
Jiang Fang
Diandian Guo
Yezeng Chen
Yinlong Liu
Wei Ma
Jiyan Sun
Liru Geng
267
0
0
28 Jul 2025
MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
Adrien Bazoge
ELM
139
0
0
28 Jul 2025
ELMES: An Automated Framework for Evaluating Large Language Models in Educational Scenarios
ELMES: An Automated Framework for Evaluating Large Language Models in Educational Scenarios
Shouáng Wei
X. Wang
Shuzhen Bi
Jian Chen
Ruijia Li
...
M. Zhang
Yu Song
Bingdong Li
Aimin Zhou
Hao Hao
ELM
128
0
0
27 Jul 2025
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Guangchen Lan
Sipeng Zhang
Tianle Wang
Yuwei Zhang
Daoan Zhang
Xinpeng Wei
Xiaoman Pan
Hongming Zhang
Dong-Jun Han
Christopher G. Brinton
298
2
0
27 Jul 2025
SDD: Self-Degraded Defense against Malicious Fine-tuning
SDD: Self-Degraded Defense against Malicious Fine-tuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
ZiXuan Chen
Weikai Lu
Xin Lin
Ziqian Zeng
AAML
167
1
0
27 Jul 2025
RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation
RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation
Ran Xu
Yuchen Zhuang
Yue Yu
Haoyu Wang
W. Shi
Carl Yang
RALM3DV
144
3
0
26 Jul 2025
A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction
A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction
Xiaohua Feng
Jiaming Zhang
Fengyuan Yu
C. Wang
Li Zhang
Kaixiang Li
Yuyuan Li
Chaochao Chen
Jianwei Yin
MU
274
2
0
26 Jul 2025
Uncovering Cross-Linguistic Disparities in LLMs using Sparse Autoencoders
Uncovering Cross-Linguistic Disparities in LLMs using Sparse Autoencoders
Richmond Sin Jing Xuan
Jalil Huseynov
Yang Zhang
145
0
0
25 Jul 2025
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning
Shengyuan Wang
J. Feng
Tianhui Liu
Dan Pei
Yong Li
HILM
172
1
0
25 Jul 2025
Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks
Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks
Maitha Alshehhi
Ahmed Sharshar
Mohsen Guizani
141
1
0
25 Jul 2025
How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation under the One-Time-Pad-Based Framework
How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation under the One-Time-Pad-Based Framework
Zi Liang
Liantong Yu
Shiyu Zhang
Qingqing Ye
Haibo Hu
ELM
210
1
0
25 Jul 2025
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Sara Papi
Maike Züfle
Marco Gaido
Beatrice Savoldi
Danni Liu
Ioannis Douros
L. Bentivogli
Jan Niehues
300
4
0
25 Jul 2025
CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages
CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages
Yilun Yang
Yekun Chai
144
0
0
24 Jul 2025
Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation
Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation
Shiyuan Li
Yixin Liu
Qingsong Wen
Chengqi Zhang
Shirui Pan
355
16
0
24 Jul 2025
Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory
Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory
Mutian Yang
Jiandong Gao
Ji Wu
191
1
0
24 Jul 2025
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
Asaf Yehudai
Lilach Eden
Yotam Perlitz
Roy Bar-Haim
Michal Shmueli-Scheuer
ELM
210
1
0
24 Jul 2025
Technical Report of TeleChat2, TeleChat2.5 and T1
Technical Report of TeleChat2, TeleChat2.5 and T1
Zihan Wang
Xinzhang Liu
Yitong Yao
Chao Wang
Yu Zhao
...
Bingkai Yang
Shuangyong Song
Yongxiang Li
Zhongjiang He
Xuelong Li
AI4TSLRM
429
6
0
24 Jul 2025
StyleAdaptedLM: Enhancing Instruction Following Models with Efficient Stylistic Transfer
StyleAdaptedLM: Enhancing Instruction Following Models with Efficient Stylistic Transfer
Pritika Ramu
Apoorv Saxena
Meghanath M Y
Varsha Sankar
Debraj Basu
116
0
0
24 Jul 2025
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
LLMSV
189
1
0
24 Jul 2025
NeuralDB: Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database
NeuralDB: Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database
Weizhi Fei
Hao Shi
Jing Xu
Jingchen Peng
Jiazheng Li
Jingzhao Zhang
Bo Bai
Wei Han
Z. Chen
Xueyan Niu
KELM
179
0
0
24 Jul 2025
SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models
SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models
Wonjun Jeong
Dongseok Kim
Taegkeun Whangbo
234
1
0
24 Jul 2025
Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios
Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios
Zhuang Qiang Bok
Watson Wei Khong Chua
AIFin
149
0
0
24 Jul 2025
Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation
Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation
Xinrui Chen
Hongxing Zhang
Fanyi Zeng
Yongxian Wei
Yizhi Wang
Xitong Ling
Guanghao Li
Chun Yuan
153
1
0
24 Jul 2025
A Comprehensive Evaluation on Quantization Techniques for Large Language Models
A Comprehensive Evaluation on Quantization Techniques for Large Language Models
Yutong Liu
Cairong Zhao
Guosheng Hu
MQ
224
0
0
23 Jul 2025
The Geometry of Harmfulness in LLMs through Subconcept Probing
The Geometry of Harmfulness in LLMs through Subconcept Probing
McNair Shah
Saleena Angeline
Adhitya Rajendra Kumar
Naitik Chheda
Kevin Zhu
Sean O Brien
Sean O'Brien
Will Cai
LLMSV
239
3
0
23 Jul 2025
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Changxin Tian
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Zhiqiang Zhang
Jun Zhou
MoE
389
12
0
23 Jul 2025
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
Alexander R. Fabbri
Diego Mares
Jorge Flores
Meher Mankikar
Ernesto Hernandez
Dean Lee
Bing Liu
Chen Xing
LRM
333
2
0
23 Jul 2025
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Changxin Tian
Jiapeng Wang
Qian Zhao
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Jiaxin Mao
Wayne Xin Zhao
Zhiqiang Zhang
Jun Zhou
MoMeCLL
264
7
0
23 Jul 2025
Awakening LLMs' Reasoning Potential: A Fine-Grained Pipeline to Evaluate and Mitigate Vague Perception
Awakening LLMs' Reasoning Potential: A Fine-Grained Pipeline to Evaluate and Mitigate Vague Perception
Zipeng Ling
Yuehao Tang
Qi Zheng
Junqi Yang
Shenghong Fu
Chen Huang
Kejia Huang
Yao Wan
Zhichao Hou
Xuming Hu
LRM
385
2
0
22 Jul 2025
The Ever-Evolving Science Exam
The Ever-Evolving Science Exam
Junying Wang
Zicheng Zhang
Yijin Guo
Farong Wen
Ye Shen
...
Wenzhe Li
Chunyi Li
Z. Chen
Qi Jia
Guangtao Zhai
ELM
347
3
0
22 Jul 2025
A Unifying Scheme for Extractive Content Selection Tasks
A Unifying Scheme for Extractive Content Selection Tasks
Shmuel Amar
Ori Shapira
Aviv Slobodkin
Ido Dagan
155
0
0
22 Jul 2025
Depth Gives a False Sense of Privacy: LLM Internal States Inversion
Depth Gives a False Sense of Privacy: LLM Internal States Inversion
Tian Dong
Yan Meng
Shaofeng Li
Guoxing Chen
Zhen Liu
Haojin Zhu
AAML
174
2
0
22 Jul 2025
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Run-Ze Fan
Zengzhi Wang
Pengfei Liu
LRM
340
17
0
22 Jul 2025
DialogueForge: LLM Simulation of Human-Chatbot Dialogue
DialogueForge: LLM Simulation of Human-Chatbot Dialogue
Ruizhe Zhu
Hao Zhu
Yaxuan Li
Syang Zhou
Shijing Cai
Malgorzata Lazuka
Elliott Ash
108
0
0
21 Jul 2025
Metric assessment protocol in the context of answer fluctuation on MCQ tasks
Metric assessment protocol in the context of answer fluctuation on MCQ tasks
Ekaterina Goliakova
X. Renard
Marie-Jeanne Lesot
Thibault Laugel
Christophe Marsala
Marcin Detyniecki
137
0
0
21 Jul 2025
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
Kailai Yang
Xiao Liu
Lei Ji
Hao Li
Yeyun Gong
Peng Cheng
M. Yang
CLL
187
2
0
21 Jul 2025
Previous
123...181920...888990
Next
Page 19 of 90
Pageof 90