ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,481 papers shown
KV Cache Transform Coding for Compact Storage in LLM Inference
KV Cache Transform Coding for Compact Storage in LLM Inference
Konrad Staniszewski
Adrian Łańcucki
VLM
425
0
0
03 Nov 2025
Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation
Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation
Hung-Shin Lee
Chen-Chi Chang
Ching-Yuan Chen
Yun-Hsiang Hsu
117
0
0
03 Nov 2025
EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning
EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning
Ayesha Gull
Muhammad Usman Safder
Rania Elbadry
Preslav Nakov
Zhuohan Xie
Preslav Nakov
Zhuohan Xie
ELMLRM
220
0
0
03 Nov 2025
A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains
A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains
Greta Ontrup
Annika Bush
Markus Pauly
Meltem Aksoy
123
0
0
03 Nov 2025
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
İbrahim Ethem Deveci
Duygu Ataman
ReLMALMELMLRM
215
0
0
03 Nov 2025
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Md Tanvirul Alam
Dipkamal Bhusal
Salman Ahmad
Nidhi Rastogi
Peter Worth
ELM
214
0
0
03 Nov 2025
Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI
Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI
Sharan Maiya
Henning Bartsch
Nathan Lambert
Evan Hubinger
116
1
0
03 Nov 2025
Assessing LLM Reasoning Steps via Principal Knowledge Grounding
Assessing LLM Reasoning Steps via Principal Knowledge GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Hyeon Hwang
Yewon Cho
Chanwoong Yoon
Yein Park
Minju Song
Kyungjae Lee
Gangwoo Kim
Jaewoo Kang
ELMLRM
279
0
0
02 Nov 2025
Two Datasets Are Better Than One: Method of Double Moments for 3-D Reconstruction in Cryo-EM
Two Datasets Are Better Than One: Method of Double Moments for 3-D Reconstruction in Cryo-EM
Joe Kileel
Oscar Mickelin
A. Singer
Sheng Xu
125
0
0
02 Nov 2025
Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering
Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering
Vlad Negoita
Mihai Masala
Traian Rebedea
123
0
0
02 Nov 2025
A CPU-Centric Perspective on Agentic AI
A CPU-Centric Perspective on Agentic AI
Ritik Raj
Hong Wang
Tushar Krishna
295
0
0
01 Nov 2025
HIP-LLM: A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models
HIP-LLM: A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models
Robab Aghazadeh-Chakherlou
Qing Guo
Siddartha Khastgir
Peter Popov
Xiaoge Zhang
Xingyu Zhao
145
0
0
01 Nov 2025
Language Modeling With Factorization Memory
Language Modeling With Factorization Memory
Lee Xiong
Maksim Tkachenko
Johanes Effendi
Ting Cai
RALMLRM
229
0
0
31 Oct 2025
Calibration Across Layers: Understanding Calibration Evolution in LLMs
Calibration Across Layers: Understanding Calibration Evolution in LLMs
Abhinav Joshi
A. Ahmad
Ashutosh Modi
UQCV
320
2
0
31 Oct 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
589
4
0
31 Oct 2025
TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control
TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control
Yuxiang Chen
Xiaoming Xu
Pengle Zhang
Michael Beyer
Martin Rapp
Jun Zhu
Jianfei Chen
MQ
152
1
0
31 Oct 2025
Consistency Training Helps Stop Sycophancy and Jailbreaks
Consistency Training Helps Stop Sycophancy and Jailbreaks
Alex Irpan
Alexander Matt Turner
Mark Kurzeja
David Elson
Rohin Shah
237
0
0
31 Oct 2025
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
Deokhyung Kang
Seonjeong Hwang
Daehui Kim
Hyounghun Kim
Gary Geunbae Lee
LRM
172
2
0
31 Oct 2025
Thought Branches: Interpreting LLM Reasoning Requires Resampling
Thought Branches: Interpreting LLM Reasoning Requires Resampling
Uzay Macar
Paul C. Bogdan
Senthooran Rajamanoharan
Neel Nanda
LRM
101
0
0
31 Oct 2025
OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education
OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education
Min Zhang
Hao Chen
Hao Chen
Wenqi Zhang
Didi Zhu
Xin Lin
Bo Jiang
Aimin Zhou
Fei Wu
Kun Kuang
ELM
161
0
0
30 Oct 2025
Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses
Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses
Duc-Hai Nguyen
Vijayakumar Nanjappan
Barry O'Sullivan
Hoang D. Nguyen
125
0
0
30 Oct 2025
The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration
The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration
Kotaro Furuya
Yuichi Kitagawa
LLMAGAI4CE
96
0
0
30 Oct 2025
From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning
From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning
Nishit Neema
Srinjoy Mukherjee
Sapan Shah
Gokul Ramakrishnan
Ganesh Venkatesh
CLL
263
0
0
30 Oct 2025
Angular Steering: Behavior Control via Rotation in Activation Space
Angular Steering: Behavior Control via Rotation in Activation Space
Hieu M. Vu
T. Nguyen
LLMSV
338
3
0
30 Oct 2025
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Andrew M. Bean
Nabeel Seedat
Shengzhuang Chen
Jonathan Richard Schwarz
95
1
0
30 Oct 2025
RCScore: Quantifying Response Consistency in Large Language Models
RCScore: Quantifying Response Consistency in Large Language Models
Dongjun Jang
Youngchae Ahn
Hyopil Shin
140
0
0
30 Oct 2025
EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge
EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge
Jack FitzGerald
Aristotelis Lazaridis
Dylan Bates
Aman Sharma
Jonnathan Castillo
...
Dave Anderson
Jonathan Beck
Jamie Cuticello
Colton Malkerson
Tyler Saltsman
ELM
320
0
0
30 Oct 2025
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Usha Bhalla
Alex Oesterling
C. M. Verdun
Himabindu Lakkaraju
Flavio du Pin Calmon
83
0
0
30 Oct 2025
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model
Biao Zhang
Yong Cheng
Siamak Shakeri
Xinyi Wang
Min Ma
Orhan Firat
147
1
0
30 Oct 2025
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
Chenming Tang
Hsiu-Yuan Huang
Weijie Liu
Saiyong Yang
Yunfang Wu
Yunfang Wu
OffRLLRM
149
2
0
30 Oct 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
138
11
0
30 Oct 2025
Value Drifts: Tracing Value Alignment During LLM Post-Training
Value Drifts: Tracing Value Alignment During LLM Post-Training
Mehar Bhatia
Shravan Nayak
Gaurav Kamath
Marius Mosbach
Karolina Stañczak
Vered Shwartz
Siva Reddy
161
2
0
30 Oct 2025
e1: Learning Adaptive Control of Reasoning Effort
e1: Learning Adaptive Control of Reasoning Effort
Michael Kleinman
Matthew Trager
Alessandro Achille
Wei Xia
Stefano Soatto
LRM
240
2
0
30 Oct 2025
Remote Labor Index: Measuring AI Automation of Remote Work
Remote Labor Index: Measuring AI Automation of Remote Work
Mantas Mazeika
Alice Gatti
Cristina Menghini
Udari Madhushani Sehwag
Shivam Singhal
...
Summer Yue
Alexandr Wang
Bing Liu
Ernesto Hernandez
Dan Hendrycks
147
3
0
30 Oct 2025
Revisiting Multilingual Data Mixtures in Language Model Pretraining
Revisiting Multilingual Data Mixtures in Language Model Pretraining
Negar Foroutan
Paul Teiletche
Ayush Kumar Tarun
Antoine Bosselut
LRM
93
1
0
29 Oct 2025
CLINB: A Climate Intelligence Benchmark for Foundational Models
CLINB: A Climate Intelligence Benchmark for Foundational Models
Michelle Chen Huebscher
Katharine Mach
Aleksandar Stanić
Markus Leippold
Ben Gaiarin
...
Massimiliano Ciaramita
Joeri Rogelj
Christian Buck
Lierni Sestorain Saralegui
Reto Knutti
HILMELM
319
0
0
29 Oct 2025
AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache
AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention CacheIACR Cryptology ePrint Archive (IACR ePrint), 2025
Dinghong Song
Yuan Feng
Y. Wang
S. Chen
Cyril Guyot
F. Blagojevic
Hyeran Jeon
Pengfei Su
Dong Li
214
0
0
29 Oct 2025
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
Fali Wang
Jihai Chen
Shuhua Yang
Runxue Bao
Tianxiang Zhao
Zhiwei Zhang
Xianfeng Tang
Hui Liu
Qi He
Suhang Wang
118
0
0
29 Oct 2025
A Survey on Unlearning in Large Language Models
A Survey on Unlearning in Large Language Models
Ruichen Qiu
Jiajun Tan
Jiayue Pu
Honglin Wang
Xiao-Shan Gao
Fei Sun
MUAILawPILM
665
0
0
29 Oct 2025
Are Language Models Efficient Reasoners? A Perspective from Logic Programming
Are Language Models Efficient Reasoners? A Perspective from Logic Programming
Andreas Opedal
Yanick Zengaffinen
Haruki Shirakami
Clemente Pasti
Mrinmaya Sachan
Abulhair Saparov
Ryan Cotterell
Bernhard Schölkopf
ReLMLRM
158
0
0
29 Oct 2025
SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications
SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications
Emily Herron
Junqi Yin
Feiyi Wang
HILMELM
458
0
0
29 Oct 2025
SMAGDi: Socratic Multi Agent Interaction Graph Distillation for Efficient High Accuracy Reasoning
SMAGDi: Socratic Multi Agent Interaction Graph Distillation for Efficient High Accuracy Reasoning
Aayush Aluru
Myra Malik
Samarth Patankar
Spencer Kim
Kevin Zhu
Sean O'Brien
Vasu Sharma
104
0
0
29 Oct 2025
ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?
ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?
Christine Ye
Sihan Yuan
Suchetha Cooray
Steven Dillmann
Ian L. V. Roque
...
Nolan Koblischke
Frank J Qu
Diyi Yang
Risa Wechsler
Ioana Ciuca
133
0
0
28 Oct 2025
Relative Scaling Laws for LLMs
Relative Scaling Laws for LLMs
William B. Held
David Leo Wright Hall
Abigail Z. Jacobs
Diyi Yang
142
1
0
28 Oct 2025
FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic
FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic
Kanghyun Choi
Hyeyoon Lee
S. Park
Dain Kwon
Jinho Lee
MQ
172
0
0
28 Oct 2025
ChessQA: Evaluating Large Language Models for Chess Understanding
ChessQA: Evaluating Large Language Models for Chess Understanding
Qianfeng Wen
Zhenwei Tang
Ashton Anderson
ELMLRM
197
1
0
28 Oct 2025
MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling
MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling
Yuxi Liu
Renjia Deng
Yutong He
Xue Wang
Tao Yao
Kun Yuan
148
0
0
28 Oct 2025
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
T. Chang
Catherine Arnett
Abdelrahman Eldesokey
Abdelrahman Sadallah
Abeer Kashar
...
Francesco Orabona
Francesco Periti
Gbenga Kayode Solomon
Gia Nghia Ngo
Gloria Udhehdhe-oze
LRMELM
170
1
0
28 Oct 2025
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
Xuanzhong Chen
Zile Qiao
Guoxin Chen
L. Su
Zhen Zhang
Xinyu Wang
Pengjun Xie
Fei Huang
Jingren Zhou
Yong Jiang
LLMAGELM
167
3
0
28 Oct 2025
Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices
Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices
Špela Vintar
Taja Kuzman Pungeršek
Mojca Brglez
Nikola Ljubešić
183
0
0
28 Oct 2025
Previous
12345...888990
Next