ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,428 papers shown
Title
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Dongyang Fan
Diba Hashemi
Sai Praneeth Karimireddy
Martin Jaggi
73
0
0
26 Nov 2025
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Zhenchao Tang
Fang Wang
Haohuai He
Jiale Zhou
Tianxu Lv
...
Minghao Yang
Y. Wang
Jiayang Wu
Yidong Song
J. Yao
CLL
370
0
0
26 Nov 2025
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Yeganeh Kordi
Nihal V. Nayak
Max Zuo
Ilana Nguyen
Stephen H. Bach
78
0
0
26 Nov 2025
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
Frederico Wieser
Martin A Benfeghoul
Haitham Bou-Ammar
Jun Wang
Zafeirios Fountas
82
0
0
26 Nov 2025
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
Peiran Xu
Sudong Wang
Yao Zhu
Jianing Li
Yunjian Zhang
LRM
230
0
0
26 Nov 2025
On the Limits of Innate Planning in Large Language Models
On the Limits of Innate Planning in Large Language Models
Charles Schepanowski
Charles Ling
LLMAGLRMELM
365
0
0
26 Nov 2025
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
Róbert Belanec
Branislav Pecher
Ivan Srba
Maria Bielikova
88
1
0
26 Nov 2025
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
Yuanhao Li
Mingshan Liu
Hongbo Wang
Yiding Zhang
Yifei Ma
Wei Tan
AI4TSKELMLRMAI4CE
317
0
0
25 Nov 2025
Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy
Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy
Daniel I Jackson
Emma L Jensen
Syed-Amad Hussain
Emre Sezgin
AI4MHELM
227
0
0
25 Nov 2025
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Wentao Hu
Mingkuan Zhao
Shuangyong Song
Xiaoyan Zhu
Xin Lai
Jiayin Wang
67
1
0
25 Nov 2025
BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
Abdullah Al Sefat
76
0
0
25 Nov 2025
Representation Interventions Enable Lifelong Unstructured Knowledge Control
Representation Interventions Enable Lifelong Unstructured Knowledge Control
Xuyuan Liu
Zhengzhang Chen
Xinshuai Dong
Yanchi Liu
Xujiang Zhao
Shengyu Chen
Haoyu Wang
Yujun Yan
Haifeng Chen
KELM
44
0
0
25 Nov 2025
Geometry of Decision Making in Language Models
Geometry of Decision Making in Language Models
Abhinav Joshi
Divyanshu Bhatt
Ashutosh Modi
AI4CELRM
210
0
0
25 Nov 2025
Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries
Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries
Sree Bhattacharyya
Yaman Kumar Singla
Sudhir Yarram
Somesh Singh
Harini S I
James Z. Wang
48
0
0
25 Nov 2025
Vision-Language Memory for Spatial Reasoning
Vision-Language Memory for Spatial Reasoning
Zuntao Liu
Yi Du
Taimeng Fu
Shaoshu Su
Cherie Ho
Chen Wang
VLMLRM
129
0
0
25 Nov 2025
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
Juncheng Li
Y. Li
Hanxun Huang
Yunhao Chen
Xin Wang
Yixu Wang
Xingjun Ma
Yu-Gang Jiang
MLLMAAMLVLM
148
0
0
24 Nov 2025
EAGER: Edge-Aligned LLM Defense for Robust, Efficient, and Accurate Cybersecurity Question Answering
EAGER: Edge-Aligned LLM Defense for Robust, Efficient, and Accurate Cybersecurity Question Answering
Onat Gungor
Roshan Sood
Jiasheng Zhou
T. Rosing
AAML
29
0
0
24 Nov 2025
Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores
Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores
Congren Dai
Yue Yang
Krinos Li
Huichi Zhou
Shijie Liang
...
Peiyuan Jing
Kinhei Lee
Zhenxuan Zhang
Xiaobing Li
Maosong Sun
44
0
0
24 Nov 2025
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Junbo Zhang
Ran Chen
Qianli Zhou
Xinyang Deng
Wen Jiang
85
0
0
24 Nov 2025
Doubly Wild Refitting: Model-Free Evaluation of High Dimensional Black-Box Predictions under Convex Losses
Doubly Wild Refitting: Model-Free Evaluation of High Dimensional Black-Box Predictions under Convex Losses
Haichen Hu
David Simchi-Levi
48
0
0
24 Nov 2025
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
Ziteng Sun
Adrian Benton
Samuel Kushnir
Asher Trockman
Vikas Singh
Suhas Diggavi
A. Suresh
MQ
98
0
0
24 Nov 2025
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Kairong Luo
Zhenbo Sun
Haodong Wen
Xinyu Shi
Jiarui Cui
Chenyi Dang
Kaifeng Lyu
Wenguang Chen
120
1
0
24 Nov 2025
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Yang Liu
Xiaolong Zhong
Ling Jiang
LLMAGMUMoELRM
280
0
0
23 Nov 2025
Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets
Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets
Gowtham
Sai Rupesh
Sanjay Kumar
Saravanan
Venkata Chaithanya
VLM
157
0
0
22 Nov 2025
SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization
SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization
Jianghao Wu
Yasmeen George
Jin Ye
Y. Wu
Daniel F. Schmidt
Jianfei Cai
LRM
48
0
0
22 Nov 2025
PARROT: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
PARROT: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
Yusuf Çelebi
Mahmoud El Hussieni
Özay Ezerceli
AAML
196
0
0
21 Nov 2025
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models
Vy Nguyen
Ziqi Xu
J. Chan
Estrid He
Feng Xia
Xiuzhen Zhang
60
0
0
21 Nov 2025
Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models
Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models
Cuong Pham
Hoang Anh Dung
Cuong C. Nguyen
Trung Le
G. Carneiro
Jianfei Cai
Thanh-Toan Do
MQ
78
0
0
21 Nov 2025
E$^3$-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
E3^33-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
Tao Yuan
Haoli Bai
Yinfei Pan
Xuyang Cao
Tianyu Zhang
Lu Hou
Ting Hu
Xianzhi Yu
VLM
139
0
0
21 Nov 2025
Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models
Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models
Cuong Pham
Hoang Anh Dung
Cuong C. Nguyen
Trung Le
G. Carneiro
Thanh-Toan Do
MQ
57
0
0
21 Nov 2025
The Impact of Off-Policy Training Data on Probe Generalisation
The Impact of Off-Policy Training Data on Probe Generalisation
Nathalie Kirch
Samuel Dower
Adrians Skapars
Ekdeep Singh Lubana
Dmitrii Krasheninnikov
56
0
0
21 Nov 2025
Fantastic Bugs and Where to Find Them in AI Benchmarks
Fantastic Bugs and Where to Find Them in AI Benchmarks
Sang Truong
Yuheng Tu
Michael Hardy
Anka Reuel
Zeyu Tang
...
Jonathan Perera
Chibuike Uwakwe
Ben Domingue
Nick Haber
Sanmi Koyejo
64
0
0
20 Nov 2025
AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser
AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser
Ren Ma
Jiantao Qiu
Chao Xu
Pei Chu
Kaiwen Liu
...
Wentao Zhang
Zhongying Tu
Wentao Zhang
Dahua Lin
Conghui He
56
0
0
20 Nov 2025
Monte Carlo Expected Threat (MOCET) Scoring
Monte Carlo Expected Threat (MOCET) Scoring
Joseph Kim
Saahith Potluri
ELM
64
0
0
20 Nov 2025
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Peng Xia
K. Zeng
Jiaqi Liu
Can Qin
Fang Wu
Yiyang Zhou
Caiming Xiong
Huaxiu Yao
LLMAGLM&RoSyDa
496
1
0
20 Nov 2025
Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models
Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models
Haidong Kang
Lihong Lin
Enneng Yang
Hongning Dai
Hao Wang
LRM
132
0
0
19 Nov 2025
Multimodal Evaluation of Russian-language Architectures
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov
Ulyana Isaeva
Anton A. Emelyanov
Artem Safin
Maria Tikhonova
...
Ilseyar Alimova
Ilseyar Alimova
A. Kapitanov
Alena Fenogenova
Alena Fenogenova
190
1
0
19 Nov 2025
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
Xiaoxuan Wang
Bo Liu
Song Jiang
Jingzhou Liu
Jingyuan Qi
Xia Chen
Baosheng He
LRM
92
0
0
19 Nov 2025
Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone
Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone
Vaibhav Singh
Oleksiy Ostapenko
Pierre-Andre Noel
Torsten Scholak
MambaAI4CE
288
0
0
19 Nov 2025
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning
Hongwei Liu
J. Liu
Shudong Liu
Haodong Duan
Yuqiang Li
...
Conghui He
Qi Zhang
Songyang Zhang
Lei Bai
Kai Chen
LRMALMELM
311
0
0
18 Nov 2025
Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
Sushant Mehta
70
0
0
18 Nov 2025
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models
Declan Jackson
William Keating
George Cameron
Micah Hill-Smith
HILMRALMELM
416
0
0
17 Nov 2025
Why is "Chicago" Predictive of Deceptive Reviews? Using LLMs to Discover Language Phenomena from Lexical Cues
Why is "Chicago" Predictive of Deceptive Reviews? Using LLMs to Discover Language Phenomena from Lexical Cues
Jiaming Qu
Mengtian Guo
Yue Wang
80
0
0
17 Nov 2025
Dynamic Template Selection for Output Token Generation Optimization: MLP-Based and Transformer Approaches
Dynamic Template Selection for Output Token Generation Optimization: MLP-Based and Transformer Approaches
Bharadwaj Yadavalli
93
0
0
17 Nov 2025
SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment
SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment
Jiacheng Wang
Yejun Zeng
Jinyang Guo
Yuqing Ma
Aishan Liu
Xianglong Liu
MQ
221
1
0
17 Nov 2025
SGuard-v1: Safety Guardrail for Large Language Models
SGuard-v1: Safety Guardrail for Large Language Models
JoonHo Lee
HyeonMin Cho
Jaewoong Yun
Hyunjae Lee
JunKyu Lee
Juree Seok
56
0
0
16 Nov 2025
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
Baolong Bi
Shenghua Liu
Yiwei Wang
Siqian Tong
Lingrui Mei
Yuyao Ge
Yilong Xu
Jiafeng Guo
Xueqi Cheng
OffRLLRM
172
3
0
15 Nov 2025
On the Measure of a Model: From Intelligence to Generality
On the Measure of a Model: From Intelligence to Generality
Ruchira Dhar
Ninell Oldenburg
Anders Soegaard
ELM
93
0
0
14 Nov 2025
Radiology Workflow-Guided Hierarchical Reinforcement Fine-Tuning for Medical Report Generation
Radiology Workflow-Guided Hierarchical Reinforcement Fine-Tuning for Medical Report Generation
Bodong Du
Honglong Yang
Xiaomeng Li
132
5
0
13 Nov 2025
LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
Huimin Ren
Yan Liang
Baiqiao Su
Chaobo Sun
Hengtong Lu
Kaike Zhang
Chen Wei
ELM
64
0
0
13 Nov 2025
1234...878889
Next