ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,462 papers shown
Title
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in Large Language Model Routing
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in Large Language Model Routing
Yichi Zhang
Fangzheng Xie
Shu Yang
Chong Wu
108
0
0
24 Dec 2025
SoK: Are Watermarks in LLMs Ready for Deployment?
SoK: Are Watermarks in LLMs Ready for Deployment?
Kieu Dang
Phung Lai
Nhathai Phan
Yelong Shen
Ruoming Jin
Abdallah Khreishah
My T. Thai
143
1
0
24 Dec 2025
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
188
0
0
24 Dec 2025
MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
Yizhou Zhao
Zhiwei Steven Wu
Adam Block
72
0
0
03 Dec 2025
KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing
KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing
Lishuo Deng
Shaojie Xu
Jinwu Chen
Changwei Yan
Jiajie Wang
Zhe Jiang
Weiwei Shan
4
0
0
03 Dec 2025
Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers
Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers
H. Lin
Zhiqi Bai
X. Zhang
Sen Yang
Xiang Li
...
Yongchi Zhao
Jiamang Wang
Yuchi Xu
Wenbo Su
B. Zheng
48
0
0
03 Dec 2025
Log Probability Tracking of LLM APIs
Log Probability Tracking of LLM APIs
Timothée Chauvin
Erwan Le Merrer
F. Taïani
Gilles Tredan
92
0
0
03 Dec 2025
Evaluating Hydro-Science and Engineering Knowledge of Large Language Models
Evaluating Hydro-Science and Engineering Knowledge of Large Language Models
S. Hu
Wenbo Shan
Yingjia Li
Zhiqi Wan
Xinpeng Yu
...
Chee Hui Lai
Wei Luo
Yubin He
Bin Xu
Jianshi Zhao
ELMAI4CE
138
0
0
03 Dec 2025
Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules
Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules
Amr Mohamed
Yang Zhang
Michalis Vazirgiannis
Guokan Shang
AI4CE
108
0
0
02 Dec 2025
Lumos: Let there be Language Model System Certification
Lumos: Let there be Language Model System Certification
Isha Chaudhary
Vedaant V. Jain
Avaljot Singh
Kavya Sachdeva
Sayan Ranu
Gagandeep Singh
28
0
0
02 Dec 2025
When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers
When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers
Jack Lu
Ryan Teehan
Jinran Jin
Mengye Ren
LRM
104
0
0
02 Dec 2025
LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems
LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems
Yuanhe Zhang
Weiliu Wang
Zhenhong Zhou
Kun Wang
Jie Zhang
Li Sun
Yang Liu
Sen Su
56
0
0
02 Dec 2025
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
Róbert Belanec
Ivan Srba
Maria Bielikova
ALM
336
0
0
02 Dec 2025
MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm
MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm
Wei Chen
Chaoqun Du
Feng Gu
Wei He
Qizhen Li
...
Pengfei Yu
Y. Zheng
Chunpeng Zhou
Pan Zhou
Xuhan Zhu
MLLMOffRLVLM
581
0
0
02 Dec 2025
InstructLR: A Scalable Approach to Create Instruction Dataset for Under-Resourced Languages
InstructLR: A Scalable Approach to Create Instruction Dataset for Under-Resourced Languages
Mamadou K. Keita
Sébastien Diarra
Christopher Homan
Seydou Diallo
8
0
0
01 Dec 2025
KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference
Sai Gokhale
Devleena Das
Rajeev Patwari
Ashish Sirasao
Elliott Delaye
MQ
296
0
0
01 Dec 2025
Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging
Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging
Kuangpu Guo
Yuhe Ding
Jian Liang
Zilei Wang
Ran He
MoMe
85
0
0
01 Dec 2025
Rectifying LLM Thought from Lens of Optimization
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
88
0
0
01 Dec 2025
Breaking It Down: Domain-Aware Semantic Segmentation for Retrieval Augmented Generation
Aparajitha Allamraju
Maitreya Prafulla Chitale
Hiranmai Sri Adibhatla
Rahul Mishra
Manish Shrivastava
24
0
0
29 Nov 2025
EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education
EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education
Guoqing Ma
Jia Zhu
Hanghui Guo
Weijie Shi
Yue Cui
Jiawei Shen
Zilong Li
Yidan Liang
AI4EdELM
280
0
0
29 Nov 2025
A Rosetta Stone for AI Benchmarks
A Rosetta Stone for AI Benchmarks
A. Ho
Jean-Stanislas Denain
David Atanasov
Samuel Albanie
Rohin Shah
ELM
200
0
0
28 Nov 2025
AgentShield: Make MAS more secure and efficient
AgentShield: Make MAS more secure and efficient
Kaixiang Wang
Zhaojiacheng Zhou
Bunyod Suvonov
Jiong Lou
Jie Li
AAML
116
0
0
28 Nov 2025
Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day
Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day
Milad Abdollahzadeh
Abdul Raheem
Zilong Zhao
Uzair Javaid
Kevin Yee
Nalam Venkata Abhishek
Tram Truong-Huu
Biplab Sikdar
LMTDALM
187
0
0
28 Nov 2025
MathSight: A Benchmark Exploring Have Vision-Language Models Really Seen in University-Level Mathematical Reasoning?
MathSight: A Benchmark Exploring Have Vision-Language Models Really Seen in University-Level Mathematical Reasoning?
Yuandong Wang
Yao Cui
Yuxin Zhao
Zhen Yang
Yangfu Zhu
Zhenzhou Shao
CoGeVLMLRM
172
0
0
28 Nov 2025
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
X. S. Hu
Zhanchao Zhou
Ruiqi Liang
Zehuan Li
Wei Wu
Jianguo Li
164
0
0
28 Nov 2025
OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning
OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning
Timothy Ossowski
Sheng Zhang
Qianchu Liu
Guanghui Qin
Reuben Tan
Tristan Naumann
Junjie Hu
Hoifung Poon
LRM
184
0
0
28 Nov 2025
Unexplored flaws in multiple-choice VQA evaluations
Unexplored flaws in multiple-choice VQA evaluations
Fabio Rosenthal
Sebastian Schmidt
Thorsten Graf
Thorsten Bagodonat
Stephan Günnemann
Leo Schwinn
24
0
0
27 Nov 2025
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
Guanxi Lu
Hao Mark Chen
Zhiqiang Que
Wayne Luk
Hongxiang Fan
MQ
100
0
0
27 Nov 2025
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
Peiran Xu
Sudong Wang
Yao Zhu
Jianing Li
Yunjian Zhang
LRM
318
0
0
26 Nov 2025
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Zhenchao Tang
Fang Wang
Haohuai He
Jiale Zhou
Tianxu Lv
...
Minghao Yang
Y. Wang
Jiayang Wu
Yidong Song
J. Yao
CLL
402
0
0
26 Nov 2025
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
Róbert Belanec
Branislav Pecher
Ivan Srba
Maria Bielikova
103
1
0
26 Nov 2025
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Dongyang Fan
Diba Hashemi
Sai Praneeth Karimireddy
Martin Jaggi
101
0
0
26 Nov 2025
Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices
Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices
Paulo Cavalin
Cassia Sanctos
Marcelo Grave
Claudio S. Pinhanez
Yago Primerano
40
0
0
26 Nov 2025
On the Limits of Innate Planning in Large Language Models
On the Limits of Innate Planning in Large Language Models
Charles Schepanowski
Charles Ling
LLMAGLRMELM
409
0
0
26 Nov 2025
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Yeganeh Kordi
Nihal V. Nayak
Max Zuo
Ilana Nguyen
Stephen H. Bach
136
0
0
26 Nov 2025
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
Frederico Wieser
Martin A Benfeghoul
Haitham Bou-Ammar
Jun Wang
Zafeirios Fountas
102
0
0
26 Nov 2025
Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy
Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy
Daniel I Jackson
Emma L Jensen
Syed-Amad Hussain
Emre Sezgin
AI4MHELM
271
0
0
25 Nov 2025
Structured Prompting Enables More Robust Evaluation of Language Models
Structured Prompting Enables More Robust Evaluation of Language Models
Asad Aali
Muhammad Ahmed Mohsin
Vasiliki Bikia
Arnav Singhvi
Richard Gaus
...
Sanmi Koyejo
Emily Alsentzer
Christopher Potts
N. Shah
Akshay Chaudhari
ELMLRM
162
0
0
25 Nov 2025
Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries
Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries
Sree Bhattacharyya
Yaman Kumar Singla
Sudhir Yarram
Somesh Singh
Harini S I
James Z. Wang
92
0
0
25 Nov 2025
BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
Abdullah Al Sefat
140
1
0
25 Nov 2025
Vision-Language Memory for Spatial Reasoning
Vision-Language Memory for Spatial Reasoning
Zuntao Liu
Yi Du
Taimeng Fu
Shaoshu Su
Cherie Ho
Chen Wang
VLMLRM
213
0
0
25 Nov 2025
Geometry of Decision Making in Language Models
Geometry of Decision Making in Language Models
Abhinav Joshi
Divyanshu Bhatt
Ashutosh Modi
AI4CELRM
262
0
0
25 Nov 2025
Representation Interventions Enable Lifelong Unstructured Knowledge Control
Representation Interventions Enable Lifelong Unstructured Knowledge Control
Xuyuan Liu
Zhengzhang Chen
Xinshuai Dong
Yanchi Liu
Xujiang Zhao
Shengyu Chen
Haoyu Wang
Yujun Yan
Haifeng Chen
KELM
76
0
0
25 Nov 2025
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
Yuanhao Li
Mingshan Liu
Hongbo Wang
Yiding Zhang
Yifei Ma
Wei Tan
AI4TSKELMLRMAI4CE
374
0
0
25 Nov 2025
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Wentao Hu
Mingkuan Zhao
Shuangyong Song
Xiaoyan Zhu
Xin Lai
Jiayin Wang
99
1
0
25 Nov 2025
Mirror, Mirror on the Wall -- Which is the Best Model of Them All?
Mirror, Mirror on the Wall -- Which is the Best Model of Them All?
Dina Sayed
Heiko Schuldt
32
0
0
25 Nov 2025
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Junbo Zhang
Ran Chen
Qianli Zhou
Xinyang Deng
Wen Jiang
161
1
0
24 Nov 2025
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
Ziteng Sun
Adrian Benton
Samuel Kushnir
Asher Trockman
Vikas Singh
Suhas Diggavi
A. Suresh
MQ
130
0
0
24 Nov 2025
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
Juncheng Li
Y. Li
Hanxun Huang
Yunhao Chen
Xin Wang
Yixu Wang
Xingjun Ma
Yu-Gang Jiang
MLLMAAMLVLM
180
0
0
24 Nov 2025
Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores
Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores
Congren Dai
Yue Yang
Krinos Li
Huichi Zhou
Shijie Liang
...
Peiyuan Jing
Kinhei Lee
Zhenxuan Zhang
Xiaobing Li
Maosong Sun
64
0
0
24 Nov 2025
1234...888990
Next