ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,428 papers shown
Title
Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything
Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything
Huawei Lin
Yunzhi Shi
Tong Geng
Weijie Zhao
Wei Wang
Ravender Pal Singh
LLMAGVLMLRM
189
0
0
04 Nov 2025
Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
Drago Plečko
Patrik Okanovic
Torsten Hoefler
Elias Bareinboim
Elias Bareinboim
112
0
0
04 Nov 2025
Cache Mechanism for Agent RAG Systems
Cache Mechanism for Agent RAG Systems
Shuhang Lin
Zhencan Peng
Lingyao Li
Xiao Lin
Xi Zhu
Yongfeng Zhang
77
0
0
04 Nov 2025
TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data
TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Changjiang Jiang
Fengchang Yu
H. Chen
Wei Lu
Jin Zeng
LMTDReLM
246
0
0
04 Nov 2025
Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI
Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI
Sharan Maiya
Henning Bartsch
Nathan Lambert
Evan Hubinger
78
0
0
03 Nov 2025
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
İbrahim Ethem Deveci
Duygu Ataman
ReLMALMELMLRM
135
0
0
03 Nov 2025
A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains
A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains
Greta Ontrup
Annika Bush
Markus Pauly
Meltem Aksoy
72
0
0
03 Nov 2025
KV Cache Transform Coding for Compact Storage in LLM Inference
KV Cache Transform Coding for Compact Storage in LLM Inference
Konrad Staniszewski
Adrian Łańcucki
VLM
216
0
0
03 Nov 2025
Synthetic Eggs in Many Baskets: The Impact of Synthetic Data Diversity on LLM Fine-Tuning
Synthetic Eggs in Many Baskets: The Impact of Synthetic Data Diversity on LLM Fine-Tuning
Max Schaffelder
Albert Gatt
SyDa
94
0
0
03 Nov 2025
Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation
Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation
Hung-Shin Lee
Chen-Chi Chang
Ching-Yuan Chen
Yun-Hsiang Hsu
83
0
0
03 Nov 2025
EngChain: A Symbolic Benchmark for Verifiable Multi-Step Reasoning in Engineering
EngChain: A Symbolic Benchmark for Verifiable Multi-Step Reasoning in Engineering
Ayesha Gull
Muhammad Usman Safder
Rania Elbadry
Preslav Nakov
Zhuohan Xie
ELMLRM
176
0
0
03 Nov 2025
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Md Tanvirul Alam
Dipkamal Bhusal
Salman Ahmad
Nidhi Rastogi
Peter Worth
ELM
179
0
0
03 Nov 2025
Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering
Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering
Vlad Negoita
Mihai Masala
Traian Rebedea
62
0
0
02 Nov 2025
Assessing LLM Reasoning Steps via Principal Knowledge Grounding
Assessing LLM Reasoning Steps via Principal Knowledge GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Hyeon Hwang
Yewon Cho
Chanwoong Yoon
Yein Park
Minju Song
Kyungjae Lee
Gangwoo Kim
Jaewoo Kang
ELMLRM
194
0
0
02 Nov 2025
Two Datasets Are Better Than One: Method of Double Moments for 3-D Reconstruction in Cryo-EM
Two Datasets Are Better Than One: Method of Double Moments for 3-D Reconstruction in Cryo-EM
Joe Kileel
Oscar Mickelin
A. Singer
Sheng Xu
36
0
0
02 Nov 2025
HIP-LLM: A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models
HIP-LLM: A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models
Robab Aghazadeh-Chakherlou
Qing Guo
Siddartha Khastgir
Peter Popov
Xiaoge Zhang
Xingyu Zhao
105
0
0
01 Nov 2025
A CPU-Centric Perspective on Agentic AI
A CPU-Centric Perspective on Agentic AI
Ritik Raj
Hong Wang
Tushar Krishna
149
0
0
01 Nov 2025
Calibration Across Layers: Understanding Calibration Evolution in LLMs
Calibration Across Layers: Understanding Calibration Evolution in LLMs
Abhinav Joshi
A. Ahmad
Ashutosh Modi
UQCV
205
2
0
31 Oct 2025
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
Deokhyung Kang
Seonjeong Hwang
Daehui Kim
Hyounghun Kim
Gary Geunbae Lee
LRM
72
0
0
31 Oct 2025
Language Modeling With Factorization Memory
Language Modeling With Factorization Memory
Lee Xiong
Maksim Tkachenko
Johanes Effendi
Ting Cai
RALMLRM
185
0
0
31 Oct 2025
TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control
TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control
Yuxiang Chen
Xiaoming Xu
Pengle Zhang
Michael Beyer
Martin Rapp
Jun Zhu
Jianfei Chen
MQ
105
0
0
31 Oct 2025
Consistency Training Helps Stop Sycophancy and Jailbreaks
Consistency Training Helps Stop Sycophancy and Jailbreaks
Alex Irpan
Alexander Matt Turner
Mark Kurzeja
David Elson
Rohin Shah
179
0
0
31 Oct 2025
Thought Branches: Interpreting LLM Reasoning Requires Resampling
Thought Branches: Interpreting LLM Reasoning Requires Resampling
Uzay Macar
Paul C. Bogdan
Senthooran Rajamanoharan
Neel Nanda
LRM
60
0
0
31 Oct 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
378
1
0
31 Oct 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
88
2
0
30 Oct 2025
OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education
OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education
Min Zhang
Hao Chen
Hao Chen
Wenqi Zhang
Didi Zhu
Xin Lin
Bo Jiang
Aimin Zhou
Fei Wu
Kun Kuang
ELM
116
0
0
30 Oct 2025
Value Drifts: Tracing Value Alignment During LLM Post-Training
Value Drifts: Tracing Value Alignment During LLM Post-Training
Mehar Bhatia
Shravan Nayak
Gaurav Kamath
Marius Mosbach
Karolina Stañczak
Vered Shwartz
Siva Reddy
96
1
0
30 Oct 2025
RCScore: Quantifying Response Consistency in Large Language Models
RCScore: Quantifying Response Consistency in Large Language Models
Dongjun Jang
Youngchae Ahn
Hyopil Shin
56
0
0
30 Oct 2025
e1: Learning Adaptive Control of Reasoning Effort
e1: Learning Adaptive Control of Reasoning Effort
Michael Kleinman
Matthew Trager
Alessandro Achille
Wei Xia
Stefano Soatto
LRM
183
2
0
30 Oct 2025
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Andrew M. Bean
Nabeel Seedat
Shengzhuang Chen
Jonathan Richard Schwarz
56
0
0
30 Oct 2025
Remote Labor Index: Measuring AI Automation of Remote Work
Remote Labor Index: Measuring AI Automation of Remote Work
Mantas Mazeika
Alice Gatti
Cristina Menghini
Udari Madhushani Sehwag
Shivam Singhal
...
Summer Yue
Alexandr Wang
Bing Liu
Ernesto Hernandez
Dan Hendrycks
91
2
0
30 Oct 2025
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
Chenming Tang
Hsiu-Yuan Huang
Weijie Liu
Saiyong Yang
Yunfang Wu
OffRLLRM
108
0
0
30 Oct 2025
EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge
EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge
Jack FitzGerald
Aristotelis Lazaridis
Dylan Bates
Aman Sharma
Jonnathan Castillo
...
Dave Anderson
Jonathan Beck
Jamie Cuticello
Colton Malkerson
Tyler Saltsman
ELM
254
0
0
30 Oct 2025
Angular Steering: Behavior Control via Rotation in Activation Space
Angular Steering: Behavior Control via Rotation in Activation Space
Hieu M. Vu
T. Nguyen
LLMSV
252
3
0
30 Oct 2025
Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses
Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses
Duc-Hai Nguyen
Vijayakumar Nanjappan
Barry O'Sullivan
Hoang D. Nguyen
100
0
0
30 Oct 2025
From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning
From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning
Nishit Neema
Srinjoy Mukherjee
Sapan Shah
Gokul Ramakrishnan
Ganesh Venkatesh
CLL
204
0
0
30 Oct 2025
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
Biao Zhang
Yong Cheng
Siamak Shakeri
Xinyi Wang
Min Ma
Orhan Firat
85
0
0
30 Oct 2025
The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration
The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration
Kotaro Furuya
Yuichi Kitagawa
LLMAGAI4CE
68
0
0
30 Oct 2025
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Usha Bhalla
Alex Oesterling
C. M. Verdun
Himabindu Lakkaraju
Flavio du Pin Calmon
48
0
0
30 Oct 2025
Revisiting Multilingual Data Mixtures in Language Model Pretraining
Revisiting Multilingual Data Mixtures in Language Model Pretraining
Negar Foroutan
Paul Teiletche
Ayush Kumar Tarun
Antoine Bosselut
LRM
64
1
0
29 Oct 2025
SMAGDi: Socratic Multi Agent Interaction Graph Distillation for Efficient High Accuracy Reasoning
SMAGDi: Socratic Multi Agent Interaction Graph Distillation for Efficient High Accuracy Reasoning
Aayush Aluru
Myra Malik
Samarth Patankar
Spencer Kim
Kevin Zhu
Sean O'Brien
Vasu Sharma
56
0
0
29 Oct 2025
A Survey on Unlearning in Large Language Models
A Survey on Unlearning in Large Language Models
Ruichen Qiu
Jiajun Tan
Jiayue Pu
Honglin Wang
Xiao-Shan Gao
Fei Sun
MUAILawPILM
534
0
0
29 Oct 2025
SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications
SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications
Emily Herron
Junqi Yin
Feiyi Wang
HILMELM
331
0
0
29 Oct 2025
Are Language Models Efficient Reasoners? A Perspective from Logic Programming
Are Language Models Efficient Reasoners? A Perspective from Logic Programming
Andreas Opedal
Yanick Zengaffinen
Haruki Shirakami
Clemente Pasti
Mrinmaya Sachan
Abulhair Saparov
Ryan Cotterell
Bernhard Schölkopf
ReLMLRM
87
0
0
29 Oct 2025
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
Fali Wang
Jihai Chen
Shuhua Yang
Runxue Bao
Tianxiang Zhao
Zhiwei Zhang
Xianfeng Tang
Hui Liu
Qi He
Suhang Wang
56
0
0
29 Oct 2025
AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache
AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention CacheIACR Cryptology ePrint Archive (IACR ePrint), 2025
Dinghong Song
Yuan Feng
Y. Wang
S. Chen
Cyril Guyot
F. Blagojevic
Hyeran Jeon
Pengfei Su
Dong Li
111
0
0
29 Oct 2025
CLINB: A Climate Intelligence Benchmark for Foundational Models
CLINB: A Climate Intelligence Benchmark for Foundational Models
Michelle Chen Huebscher
Katharine Mach
Aleksandar Stanić
Markus Leippold
Ben Gaiarin
...
Massimiliano Ciaramita
Joeri Rogelj
Christian Buck
Lierni Sestorain Saralegui
Reto Knutti
HILMELM
209
0
0
29 Oct 2025
Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices
Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices
Špela Vintar
Taja Kuzman Pungeršek
Mojca Brglez
Nikola Ljubešić
139
0
0
28 Oct 2025
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
Xuanzhong Chen
Zile Qiao
Guoxin Chen
L. Su
Zhen Zhang
Xinyu Wang
Pengjun Xie
Fei Huang
Jingren Zhou
Yong Jiang
LLMAGELM
89
2
0
28 Oct 2025
FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic
FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic
Kanghyun Choi
Hyeyoon Lee
S. Park
Dain Kwon
Jinho Lee
MQ
112
0
0
28 Oct 2025
Previous
123456...878889
Next