v1v2v3 (latest)

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

20 April 2018

Amanpreet Singh

Papers citing "GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding"

50 / 4,808 papers shown

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

348

20 Oct 2025

Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey

365

20 Oct 2025

SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference

138

20 Oct 2025

DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge

136

19 Oct 2025

MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning

Vera Pavlova

Mohammed Makhlouf

CLL

156

19 Oct 2025

DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking

Lanni Bu

Lauren Levin

Amir Zeldes

167

19 Oct 2025

EditMark: Watermarking Large Language Models based on Model Editing

232

18 Oct 2025

What Limits Agentic Systems Efficiency?

Shivaram Venkataraman

LLMAG LRM

143

18 Oct 2025

MIN-Merging: Merge the Important Neurons for Model Merging

Yunfei Liang

MoMe

548

18 Oct 2025

Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures

Minh Khoi Nguyen Nhat

181

18 Oct 2025

KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models

150

17 Oct 2025

Zeroth-Order Sharpness-Aware Learning with Exponential Tilting

Xuchen Gong

Tian Li

148

17 Oct 2025

Expert Merging in Sparse Mixture of Experts with Nash Bargaining

193

17 Oct 2025

Towards Reversible Model Merging For Low-rank Weights

Mohammadsajad Alipour

Mohammad Mohammadi Amiri

MoMe

157

15 Oct 2025

Selective Adversarial Attacks on LLM Benchmarks

122

15 Oct 2025

Tahakom LLM Guidelines and Recipes: From Pre-training Data to an Arabic LLM

...

198

15 Oct 2025

175

15 Oct 2025

FedHFT: Efficient Federated Finetuning with Heterogeneous Edge Clients

183

15 Oct 2025

ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding

203

15 Oct 2025

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

208

15 Oct 2025

Chimera: State Space Models Beyond Sequences

264

14 Oct 2025

SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression

132

14 Oct 2025

Layer-Aware Influence for Online Data Valuation Estimation

260

14 Oct 2025

Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning

Dean L. Slack

Noura Al Moubayed

117

13 Oct 2025

Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning

Dongkwan Lee

Junhoo Lee

Nojun Kwak

436

13 Oct 2025

Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods

319

12 Oct 2025

Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?

...

152

12 Oct 2025

PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models

11 Oct 2025

HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks

224

11 Oct 2025

PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration

301

11 Oct 2025

Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning

117

10 Oct 2025

AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models

123

09 Oct 2025

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

383

09 Oct 2025

Learning What to Remember: Adaptive Probabilistic Memory Retention for Memory-Efficient Language Models

S M Rafiuddin

Muntaha Nujat Khan

RALM KELM

142

09 Oct 2025

SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks

Md. Kowsher

Ali O. Polat

Ehsan Mohammady Ardehaly

189

09 Oct 2025

Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors

Vasileios Titopoulos

K. Alexandridis

G. Dimitrakopoulos

111

08 Oct 2025

Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain

156

08 Oct 2025

Reasoning for Hierarchical Text Classification: The Case of Patents

149

08 Oct 2025

Benchmarking is Broken -- Don't Let AI be its Own Judge

...

154

08 Oct 2025

Learning to Rewrite Prompts for Bootstrapping LLMs on Downstream Tasks

119

08 Oct 2025

OBSR: Open Benchmark for Spatial Representations

154

07 Oct 2025

Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models

182

07 Oct 2025

Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation

222

06 Oct 2025

Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models

197

06 Oct 2025

Boomerang Distillation Enables Zero-Shot Model Size Interpolation

158

06 Oct 2025

COLE: a Comprehensive Benchmark for French Language Understanding Evaluation

David Beauchemin

Yan Tremblay

Mohamed Amine Youssef

Richard Khoury

ELM

307

06 Oct 2025

A Set of Quebec-French Corpus of Regional Expressions and Terms

David Beauchemin

Yan Tremblay

Mohamed Amine Youssef

Richard Khoury

135

06 Oct 2025

SocialNLI: A Dialogue-Centric Social Inference Dataset

Akhil Deo

Kate Sanders

Benjamin Van Durme

142

06 Oct 2025

Modeling Time Series Dynamics with Fourier Ordinary Differential Equations

Muhao Guo

Yang Weng

AI4TS

156

05 Oct 2025

Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators

139

05 Oct 2025