v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020

7 September 2020

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,486 papers shown

Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior

152

16 Oct 2025

NOSA: Native and Offloadable Sparse Attention

Zhou Su

...

Hengyu Zhao

Yudong Wang

Chaojun Xiao

Xu Han

Zhiyuan Liu

175

15 Oct 2025

Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism

231

15 Oct 2025

To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models

219

15 Oct 2025

BioMedSearch: A Multi-Source Biomedical Retrieval Framework Based on LLMs

139

15 Oct 2025

ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding

204

15 Oct 2025

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

122

15 Oct 2025

GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models

196

15 Oct 2025

Selective Adversarial Attacks on LLM Benchmarks

122

15 Oct 2025

Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

202

15 Oct 2025

End-to-End Multi-Modal Diffusion Mamba

141

15 Oct 2025

In-Distribution Steering: Balancing Control and Coherence in Language Model Generation

228

15 Oct 2025

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

212

15 Oct 2025

CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning

184

15 Oct 2025

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

191

15 Oct 2025

Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps

Ahmed Alzubaidi

Shaikha Alsuwaidi

Basma El Amel Boussaha

163

15 Oct 2025

RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems

124

15 Oct 2025

Adaptive Reasoning Executor: A Collaborative Agent System for Efficient Reasoning

110

15 Oct 2025

Tahakom LLM Guidelines and Recipes: From Pre-training Data to an Arabic LLM

...

202

15 Oct 2025

Dr.LLM: Dynamic Layer Routing in LLMs

378

14 Oct 2025

KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

...

169

14 Oct 2025

Evolution of meta's llama models and parameter-efficient fine-tuning of large language models: a survey

Abdulhady Abas Abdullah

193

14 Oct 2025

LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens

146

13 Oct 2025

ADVICE: Answer-Dependent Verbalized Confidence Estimation

Ki Jung Seo

Sehun Lim

Taeuk Kim

13 Oct 2025

Beyond Consensus: Mitigating the Agreeableness Bias in LLM Judge Evaluations

13 Oct 2025

PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature

Qi Liu

308

13 Oct 2025

LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance

135

13 Oct 2025

Neural Weight Compression for Language Models

140

13 Oct 2025

DND: Boosting Large Language Models with Dynamic Nested Depth

237

13 Oct 2025

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning

...

290

13 Oct 2025

Balancing Synthetic Data and Replay for Enhancing Task-Specific Capabilities

140

13 Oct 2025

LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models

13 Oct 2025

Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks

...

201

13 Oct 2025

MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models

176

13 Oct 2025

APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport

127

13 Oct 2025

Harnessing Consistency for Robust Test-Time LLM Ensemble

147

12 Oct 2025

RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

Zichun Yu

Chenyan Xiong

OnRL

236

12 Oct 2025

HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication

137

12 Oct 2025

D3MAS: Decompose, Deduce, and Distribute for Enhanced Knowledge Sharing in Multi-Agent Systems

136

12 Oct 2025

AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs

201

12 Oct 2025

Trace Length is a Simple Uncertainty Signal in Reasoning Models

148

12 Oct 2025

MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision

Hongjie Zheng

Zesheng Shi

Ping Yi

132

12 Oct 2025

Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?

...

156

12 Oct 2025

Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?

118

12 Oct 2025

SASER: Stego attacks on open-source LLMs

171

12 Oct 2025

Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models

11 Oct 2025

EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing

11 Oct 2025

CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models

136

11 Oct 2025

PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration

305

11 Oct 2025

MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning

...

140

11 Oct 2025