v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020

7 September 2020

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,483 papers shown

Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs

164

12 Sep 2025

VARCO-VISION-2.0 Technical Report

215

12 Sep 2025

Automated MCQA Benchmarking at Scale: Evaluating Reasoning Traces as Retrieval Sources for Domain Adaptation of Small Language Models

112

12 Sep 2025

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

137

12 Sep 2025

Towards Understanding Visual Grounding in Visual Language Models

Georgios Pantazopoulos

Eda B. Özyiğit

ObjD

320

12 Sep 2025

Measuring Epistemic Humility in Multimodal Large Language Models

143

11 Sep 2025

TORSO: Template-Oriented Reasoning Towards General Tasks

189

11 Sep 2025

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

300

11 Sep 2025

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

192

11 Sep 2025

Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning

142

10 Sep 2025

Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism

127

10 Sep 2025

Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison

178

10 Sep 2025

Causal Attention with Lookahead Keys

189

09 Sep 2025

Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents

Sankalp Tattwadarshi Swain

09 Sep 2025

Performance Assessment Strategies for Generative AI Applications in Healthcare

Victor Garcia

Mariia Sidulova

Aldo Badano

141

09 Sep 2025

MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations

Ruggero Marino Lazzaroni

148

08 Sep 2025

Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing

178

08 Sep 2025

COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens

Eugene Kwek

Wenpeng Yin

VLM

265

08 Sep 2025

Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian

162

06 Sep 2025

Hyperbolic Large Language Models

215

06 Sep 2025

PLaMo 2 Technical Report

...

123

05 Sep 2025

Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts

105

05 Sep 2025

Knowledge Collapse in LLMs: When Fluency Survives but Facts Fail under Recursive Synthetic Training

Figarri Keisha

Zekun Wu

Ze Wang

Adriano Soares Koshiyama

Philip C. Treleaven

KELM

178

05 Sep 2025

Personality as a Probe for LLM Evaluation: Method Trade-offs and Downstream Effects

Gunmay Handa

Zekun Wu

Adriano Soares Koshiyama

Philip C. Treleaven

126

05 Sep 2025

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate

Andrea Wynn

Harsh Satija

Gillian Hadfield

175

05 Sep 2025

Hunyuan-MT Technical Report

137

05 Sep 2025

Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too

113

05 Sep 2025

What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking

330

05 Sep 2025

Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning

Wei Yang

Jesse Thomason

192

04 Sep 2025

On Robustness and Reliability of Benchmark-Based Evaluation of LLMs

165

04 Sep 2025

SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment

...

178

04 Sep 2025

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

256

04 Sep 2025

RL's Razor: Why Online Reinforcement Learning Forgets Less

192

04 Sep 2025

Set Block Decoding is a Language Model Inference Accelerator

150

04 Sep 2025

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

...

135

04 Sep 2025

Adaptive Preference Optimization with Uncertainty-aware Utility Anchor

104

03 Sep 2025

SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala

156

03 Sep 2025

Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving

Fangzhou Wu

Sandeep Silwal

235

02 Sep 2025

Behavioral Fingerprinting of Large Language Models

02 Sep 2025

Perturbing the Derivative: Wild Refitting for Model-Free Evaluation of Machine Learning Models under Bregman Losses

Haichen Hu

David Simchi-Levi

454

02 Sep 2025

Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs

195

02 Sep 2025

JudgeAgent: Beyond Static Benchmarks for Knowledge-Driven and Dynamic LLM Evaluation

296

02 Sep 2025

Implicit Reasoning in Large Language Models: A Comprehensive Survey

229

02 Sep 2025

LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference

Krishna Teja Chitty-Venkata

160

02 Sep 2025

Dream-Coder 7B: An Open Diffusion Language Model for Code

...

139

01 Sep 2025

LongCat-Flash Technical Report

...

403

01 Sep 2025

An LLM-enabled semantic-centric framework to consume privacy policies

157

01 Sep 2025

Culture is Everywhere: A Call for Intentionally Cultural Evaluation

207

01 Sep 2025

REFRAG: Rethinking RAG based Decoding

Xiaoqiang Lin

Aritra Ghosh

Bryan Kian Hsiang Low

Anshumali Shrivastava

Vijai Mohan

LLMAG

229

01 Sep 2025

Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs

118

01 Sep 2025