v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020

7 September 2020

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,483 papers shown

Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning

Sai Teja Reddy Adapala

LRM ELM

199

23 Sep 2025

Evaluating the Safety and Skill Reasoning of Large Reasoning Models Under Compute Constraints

22 Sep 2025

MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents

22 Sep 2025

PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models

143

21 Sep 2025

seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs

21 Sep 2025

Probabilistic Token Alignment for Large Language Model Fusion

...

164

21 Sep 2025

Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories

106

20 Sep 2025

Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle

227

20 Sep 2025

LLMsPark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts

188

20 Sep 2025

Challenging the Evaluator: LLM Sycophancy Under User Rebuttal

Sungwon Kim

Daniel Khashabi

ELM

122

20 Sep 2025

Can an Individual Manipulate the Collective Decisions of Multi-Agents?

210

20 Sep 2025

GPO: Learning from Critical Steps to Improve LLM Reasoning

194

19 Sep 2025

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

118

19 Sep 2025

SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection

105

19 Sep 2025

Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models

Jose Hernandez-Orallo

136

19 Sep 2025

Robust LLM Training Infrastructure at ByteDanceSymposium on Operating Systems Principles (SOSP), 2025

...

351

19 Sep 2025

RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering

347

19 Sep 2025

Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research

Richard Diehl Martinez

144

19 Sep 2025

MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models

144

18 Sep 2025

Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction

143

18 Sep 2025

The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior

193

18 Sep 2025

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

219

18 Sep 2025

KAIO: A Collection of More Challenging Korean Questions

18 Sep 2025

ReCoVeR the Target Language: Language Steering without Sacrificing Task Performance

141

18 Sep 2025

Quantifying Self-Awareness of Knowledge in Large Language Models

121

18 Sep 2025

Rationality Check! Benchmarking the Rationality of Large Language Models

153

18 Sep 2025

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

160

18 Sep 2025

Enhancing Retrieval Augmentation via Adversarial Collaboration

112

18 Sep 2025

Synthetic bootstrapped pretraining

294

17 Sep 2025

ZERA: Zero-init Instruction Evolving Refinement Agent - From Zero Instructions to Structured Prompts via Principle-based Optimization

17 Sep 2025

DSFT: Inspiring Diffusion Large Language Models to Comprehend Mathematical and Logical Patterns

Ranfei Chen

Ming Chen

DiffM AI4CE

17 Sep 2025

Do Large Language Models Understand Word Senses?

137

17 Sep 2025

GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing

195

17 Sep 2025

Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

234

17 Sep 2025

SAIL-VL2 Technical Report

...

297

17 Sep 2025

Enhancing Multi-Agent Debate System Performance via Confidence Expression

Zijie Lin

Bryan Hooi

LLMAG

108

17 Sep 2025

Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning

Bo Yin

Xingyi Yang

Xinchao Wang

133

16 Sep 2025

Towards mitigating information leakage when evaluating safety monitors

141

16 Sep 2025

The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features

Jeremias Lino Ferrao

Matthijs van der Lende

Ilija Lichkovski

Clement Neo

LLMSV

256

16 Sep 2025

Bhaasha, Bhasa, Zaban: A Survey for Low-Resourced Languages in South Asia - Current Stage and Challenges

Sampoorna Poria

Xiaolei Huang

202

15 Sep 2025

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

116

15 Sep 2025

MALLM: Multi-Agent Large Language Models Framework

Jonas Becker

Lars Benedikt Kaesberg

238

15 Sep 2025

CBP-Tuning: Efficient Local Customization for Black-box Large Language Models

112

15 Sep 2025

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

143

15 Sep 2025

Preservation of Language Understanding Capabilities in Speech-aware Large Language Models

Marek Kubis

Paweł Skórzewski

Iwona Christop

Mateusz Czyżnikiewicz

190

15 Sep 2025

Fluid Language Model Benchmarking

135

14 Sep 2025

From Parameters to Performance: A Data-Driven Study on LLM Structure and Development

134

14 Sep 2025

Free-MAD: Consensus-Free Multi-Agent Debate

163

14 Sep 2025

Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction

150

13 Sep 2025

CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis

136

13 Sep 2025