v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020

7 September 2020

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,481 papers shown

MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling

143

28 Oct 2025

Parallel Loop Transformer for Efficient Test-Time Computation Scaling

...

116

28 Oct 2025

Multi-Agent Evolve: LLM Self-Improve through Co-evolution

295

27 Oct 2025

Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining

Xiaofan Zhou

Lu Cheng

CLL

378

27 Oct 2025

Probing Knowledge Holes in Unlearned LLMs

302

27 Oct 2025

A Survey on LLM Mid-Training

239

27 Oct 2025

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

372

27 Oct 2025

Knocking-Heads Attention

27 Oct 2025

PISA-Bench: The PISA Index as a Multilingual and Multimodal Metric for the Evaluation of Vision-Language Models

373

27 Oct 2025

Offline Preference Optimization via Maximum Marginal Likelihood Estimation

Saeed Najafi

Alona Fyshe

OffRL

144

27 Oct 2025

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

100

26 Oct 2025

Frustratingly Easy Task-aware Pruning for Large Language Models

136

26 Oct 2025

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

Omar Naim

Krish Sharma

Nicholas M. Asher

26 Oct 2025

Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints

123

26 Oct 2025

Leveraging Large Language Models to Identify Conversation Threads in Collaborative Learning

26 Oct 2025

SeeDNorm: Self-Rescaled Dynamic Normalization

145

26 Oct 2025

Adaptive Testing for LLM Evaluation: A Psychometric Alternative to Static Benchmarks

26 Oct 2025

Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs

352

25 Oct 2025

When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLMs

117

25 Oct 2025

The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models

25 Oct 2025

Model-Aware Tokenizer Transfer

Mykola Haltiuk

Aleksander Smywiński-Pohl

120

24 Oct 2025

A Diagnostic Benchmark for Sweden-Related Factual Knowledge

Jenny Kunz

HILM

179

24 Oct 2025

δ

-STEAL: LLM Stealing Attack with Local Differential Privacy

132

24 Oct 2025

Transformer Based Linear Attention with Optimized GPU Kernel Implementation

Armin Gerami

R. Duraiswami

143

24 Oct 2025

Risk Management for Mitigating Benchmark Failure Modes: BenchRisk

...

145

24 Oct 2025

Model Merging with Functional Dual Anchors

272

24 Oct 2025

Estonian Native Large Language Model Benchmark

Helena Grete Lillepalu

Tanel Alumäe

ELM

116

24 Oct 2025

Chain of Execution Supervision Promotes General Reasoning in Large Language Models

118

24 Oct 2025

On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?

Mingmeng Geng

Thierry Poibeau

DeLMO

217

23 Oct 2025

Robust Preference Alignment via Directional Neighborhood Consensus

173

23 Oct 2025

$\textsc{CantoNLU}: A benchmark for Cantonese natural language understanding$

\textsc{CantoNLU}: A benchmark for Cantonese natural language understanding

120

23 Oct 2025

What Does It Take to Build a Performant Selective Classifier?

Stephan Rabanser

Nicolas Papernot

210

23 Oct 2025

Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs

Yanlin Song

Ben Liu

Víctor Gutiérrez-Basulto

279

23 Oct 2025

LM-mixup: Text Data Augmentation via Language Model based Mixup

23 Oct 2025

Capability Ceilings in Autoregressive Language Models: Empirical Evidence from Knowledge-Intensive Tasks

Javier Marín

23 Oct 2025

ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows

...

328

23 Oct 2025

The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts

119

23 Oct 2025

DiSRouter: Distributed Self-Routing for LLM Selections

132

22 Oct 2025

LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts

241

22 Oct 2025

What is the Best Sequence Length for BABYLM?

Suchir Salhan

Richard Diehl Martinez

Zébulon Goriely

P. Buttery

103

22 Oct 2025

Data-Centric Lessons To Improve Speech-Language Pretraining

136

22 Oct 2025

LLM Unlearning with LLM Beliefs

201

22 Oct 2025

Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges

...

116

22 Oct 2025

Beyond MedQA: Towards Real-world Clinical Decision Making in the Era of LLMs

264

22 Oct 2025

Teaming LLMs to Detect and Mitigate Hallucinations

319

22 Oct 2025

WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection

105

21 Oct 2025

From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering

156

21 Oct 2025

Investigating LLM Capabilities on Long Context Comprehension for Medical Question Answering

191

21 Oct 2025

Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation

155

21 Oct 2025

Some Attention is All You Need for Retrieval

Felix Michalak

Steven Abreu

21 Oct 2025