v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020

7 September 2020

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,481 papers shown

KV Cache Transform Coding for Compact Storage in LLM Inference

Konrad Staniszewski

Adrian Łańcucki

VLM

425

03 Nov 2025

Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation

117

03 Nov 2025

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

Ayesha Gull

Muhammad Usman Safder

Rania Elbadry

Preslav Nakov

Zhuohan Xie

Preslav Nakov

Zhuohan Xie

ELM LRM

220

03 Nov 2025

A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains

123

03 Nov 2025

The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation

İbrahim Ethem Deveci

Duygu Ataman

ReLM ALM ELM LRM

215

03 Nov 2025

AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

214

03 Nov 2025

Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI

116

03 Nov 2025

Assessing LLM Reasoning Steps via Principal Knowledge GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

279

02 Nov 2025

Two Datasets Are Better Than One: Method of Double Moments for 3-D Reconstruction in Cryo-EM

125

02 Nov 2025

Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering

Vlad Negoita

Mihai Masala

Traian Rebedea

123

02 Nov 2025

A CPU-Centric Perspective on Agentic AI

Ritik Raj

Hong Wang

Tushar Krishna

295

01 Nov 2025

HIP-LLM: A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models

Robab Aghazadeh-Chakherlou

145

01 Nov 2025

Language Modeling With Factorization Memory

229

31 Oct 2025

Calibration Across Layers: Understanding Calibration Evolution in LLMs

320

31 Oct 2025

LongCat-Flash-Omni Technical Report

...

589

31 Oct 2025

TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control

152

31 Oct 2025

Consistency Training Helps Stop Sycophancy and Jailbreaks

Alex Irpan

Alexander Matt Turner

Mark Kurzeja

David Elson

Rohin Shah

237

31 Oct 2025

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?

172

31 Oct 2025

Thought Branches: Interpreting LLM Reasoning Requires Resampling

Uzay Macar

Paul C. Bogdan

Senthooran Rajamanoharan

Neel Nanda

LRM

101

31 Oct 2025

OmniEduBench: A Comprehensive Chinese Benchmark for Evaluating Large Language Models in Education

161

30 Oct 2025

Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses

Duc-Hai Nguyen

Vijayakumar Nanjappan

Barry O'Sullivan

Hoang D. Nguyen

125

30 Oct 2025

The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration

Kotaro Furuya

Yuichi Kitagawa

LLMAG AI4CE

30 Oct 2025

From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning

263

30 Oct 2025

Angular Steering: Behavior Control via Rotation in Activation Space

Hieu M. Vu

T. Nguyen

LLMSV

338

30 Oct 2025

Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings

Andrew M. Bean

Nabeel Seedat

Shengzhuang Chen

Jonathan Richard Schwarz

30 Oct 2025

RCScore: Quantifying Response Consistency in Large Language Models

Dongjun Jang

Youngchae Ahn

Hyopil Shin

140

30 Oct 2025

EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge

Jack FitzGerald

Aristotelis Lazaridis

...

320

30 Oct 2025

Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability

30 Oct 2025

Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

147

30 Oct 2025

Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error

149

30 Oct 2025

Kimi Linear: An Expressive, Efficient Attention Architecture

...

138

30 Oct 2025

Value Drifts: Tracing Value Alignment During LLM Post-Training

161

30 Oct 2025

e1: Learning Adaptive Control of Reasoning Effort

240

30 Oct 2025

Remote Labor Index: Measuring AI Automation of Remote Work

Mantas Mazeika

Alice Gatti

Cristina Menghini

Udari Madhushani Sehwag

...

147

30 Oct 2025

Revisiting Multilingual Data Mixtures in Language Model Pretraining

29 Oct 2025

CLINB: A Climate Intelligence Benchmark for Foundational Models

Michelle Chen Huebscher

...

Massimiliano Ciaramita

Joeri Rogelj

Christian Buck

Lierni Sestorain Saralegui

Reto Knutti

HILM ELM

319

29 Oct 2025

AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention CacheIACR Cryptology ePrint Archive (IACR ePrint), 2025

214

29 Oct 2025

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

118

29 Oct 2025

A Survey on Unlearning in Large Language Models

665

29 Oct 2025

Are Language Models Efficient Reasoners? A Perspective from Logic Programming

158

29 Oct 2025

SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications

458

29 Oct 2025

SMAGDi: Socratic Multi Agent Interaction Graph Distillation for Efficient High Accuracy Reasoning

104

29 Oct 2025

ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?

...

133

28 Oct 2025

Relative Scaling Laws for LLMs

William B. Held

David Leo Wright Hall

Abigail Z. Jacobs

Diyi Yang

142

28 Oct 2025

FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic

172

28 Oct 2025

ChessQA: Evaluating Large Language Models for Chess Understanding

197

28 Oct 2025

MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling

148

28 Oct 2025

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

T. Chang

Catherine Arnett

Abdelrahman Eldesokey

...

Gbenga Kayode Solomon

Gia Nghia Ngo

Gloria Udhehdhe-oze

LRM ELM

170

28 Oct 2025

AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

167

28 Oct 2025

Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices

Špela Vintar

Taja Kuzman Pungeršek

Mojca Brglez

Nikola Ljubešić

183

28 Oct 2025