Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.07461
Cited By
v1
v2
v3 (latest)
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
20 April 2018
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding"
50 / 4,808 papers shown
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu
Joachim Baumann
Lorenzo Lupo
Nigel Collier
Dirk Hovy
Paul Röttger
ALM
343
7
0
20 Oct 2025
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
Weifan Guan
Qinghao Hu
Aosheng Li
Jian Cheng
LM&Ro
364
8
0
20 Oct 2025
SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference
Wenxun Wang
Shuchang Zhou
Wenyu Sun
Peiqin Sun
Y. Liu
138
38
0
20 Oct 2025
DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge
Asmita Mohanty
Gezheng Kang
Lei Gao
M. Annavaram
135
0
0
19 Oct 2025
MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning
Vera Pavlova
Mohammed Makhlouf
CLL
156
0
0
19 Oct 2025
DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking
Lanni Bu
Lauren Levin
Amir Zeldes
166
1
0
19 Oct 2025
EditMark: Watermarking Large Language Models based on Model Editing
Shuai Li
Kejiang Chen
Jun Jiang
Jie Zhang
Qiyi Yao
K. Zeng
W. Zhang
N. Yu
WaLM
KELM
232
0
0
18 Oct 2025
What Limits Agentic Systems Efficiency?
S. Bian
Minghao Yan
Anand Jayarajan
Gennady Pekhimenko
Shivaram Venkataraman
LLMAG
LRM
142
1
0
18 Oct 2025
MIN-Merging: Merge the Important Neurons for Model Merging
Yunfei Liang
MoMe
547
0
0
18 Oct 2025
Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
Minh Khoi Nguyen Nhat
R. Teo
Laziz U. Abdullaev
Maurice Mok
Viet-Hoang Tran
T. Nguyen
MoE
181
0
0
18 Oct 2025
KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models
Dongjun Kim
Chanhee Park
Chanjun Park
Heuiseok Lim
ALM
ELM
150
0
0
17 Oct 2025
Zeroth-Order Sharpness-Aware Learning with Exponential Tilting
Xuchen Gong
Tian Li
148
0
0
17 Oct 2025
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Dung V. Nguyen
Anh T. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
Shiqi Jiang
Ethan Fetaya
Linh Duy Tran
Gal Chechik
T. Nguyen
MoMe
193
1
0
17 Oct 2025
Towards Reversible Model Merging For Low-rank Weights
Mohammadsajad Alipour
Mohammad Mohammadi Amiri
MoMe
157
0
0
15 Oct 2025
Selective Adversarial Attacks on LLM Benchmarks
Ivan Dubrovsky
Anastasia Orlova
Illarion Iov
Nina Gubina
Irena Gureeva
Alexey Zaytsev
AAML
122
0
0
15 Oct 2025
Tahakom LLM Guidelines and Recipes: From Pre-training Data to an Arabic LLM
Areej AlOtaibi
Lina Alyahya
Raghad Alshabanah
Shahad Alfawzan
Shuruq Alarefei
...
Waad Alahmed
Omar Talabay
Jalal Alowibdi
Salem Alelyani
Adel Bibi
198
0
0
15 Oct 2025
BitNet Distillation
Xun Wu
Shaohan Huang
Wenhui Wang
Ting Song
Li Dong
Yan Xia
Furu Wei
MQ
175
0
0
15 Oct 2025
FedHFT: Efficient Federated Finetuning with Heterogeneous Edge Clients
Fatih Ilhan
Selim Furkan Tekin
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Greg Eisenhauer
Yingyan Celine Lin
C. Pu
Ling Liu
FedML
182
0
0
15 Oct 2025
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Xiaozhe Li
TianYi Lyu
Siyi Yang
Yuxi Gong
Yizhao Yang
Jinxuan Huang
Ligao Zhang
Zhuoyi Huang
Qingwen Liu
ELM
203
0
0
15 Oct 2025
PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features
Wei Zou
Yupei Liu
Yanting Wang
Ying Chen
Neil Zhenqiang Gong
Jinyuan Jia
AAML
208
0
0
15 Oct 2025
Chimera: State Space Models Beyond Sequences
Aakash Lahoti
Tanya Marwah
Ratish Puduppully
Albert Gu
Mamba
GNN
AI4CE
262
1
0
14 Oct 2025
SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression
Biao Zhang
Lixin Chen
Tong Liu
Bo Zheng
132
1
0
14 Oct 2025
Layer-Aware Influence for Online Data Valuation Estimation
Ziao Yang
Longbo Huang
Hongfu Liu
TDI
260
0
0
14 Oct 2025
Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning
Dean L. Slack
Noura Al Moubayed
117
0
0
13 Oct 2025
Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
Dongkwan Lee
Junhoo Lee
Nojun Kwak
436
0
0
13 Oct 2025
Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods
Andrey Veprikov
Arman Bolatov
Samuel Horváth
Aleksandr Beznosikov
Martin Takáč
Slavomír Hanzely
ODL
318
0
0
12 Oct 2025
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
Shaobo Wang
C. Wang
Wenjie Fu
Yue Min
Mingquan Feng
...
Kexin Yang
Xingzhang Ren
Fei Huang
Dayiheng Liu
Linfeng Zhang
152
0
0
12 Oct 2025
PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models
Lancheng Zou
Shuo Yin
Zehua Pei
Tsung-Yi Ho
Farzan Farnia
Bei Yu
87
0
0
11 Oct 2025
HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
Adnan El Assadi
Isaac Chung
Roman Solomatin
Niklas Muennighoff
Kenneth Enevoldsen
220
0
0
11 Oct 2025
PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
Manjiang Yu
Hongji Li
Priyanka Singh
X. Li
Di Wang
Lijie Hu
LLMSV
300
4
0
11 Oct 2025
Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning
Minsik Choi
Hyegang Son
Changhoon Kim
Young Geun Kim
AAML
117
0
0
10 Oct 2025
AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models
Xiaoshuang Ji
Zhendong Zhao
Xiaoyan Gu
Xiaojun Chen
Xin Zhao
Zeyao Liu
123
0
0
09 Oct 2025
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
Alexander Rubinstein
Benjamin Raible
Martin Gubri
Seong Joon Oh
ELM
383
0
1
09 Oct 2025
Learning What to Remember: Adaptive Probabilistic Memory Retention for Memory-Efficient Language Models
S M Rafiuddin
Muntaha Nujat Khan
RALM
KELM
142
0
0
09 Oct 2025
SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks
Md. Kowsher
Ali O. Polat
Ehsan Mohammady Ardehaly
Mehrdad Salehi
Zia Ghiasi
Prasanth Murali
Chen Chen
189
2
0
09 Oct 2025
Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors
Vasileios Titopoulos
K. Alexandridis
G. Dimitrakopoulos
111
0
0
08 Oct 2025
Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain
Yue Li
Ran Tao
Derek Hommel
Yusuf Denizay Dönder
Sungyong Chang
David Mimno
Unso Eun Seo Jo
156
0
0
08 Oct 2025
Reasoning for Hierarchical Text Classification: The Case of Patents
Lekang Jiang
Wenjun Sun
Stephan Goetz
BDL
148
7
0
08 Oct 2025
Benchmarking is Broken -- Don't Let AI be its Own Judge
Zerui Cheng
Stella Wohnig
Ruchika Gupta
Samiul Alam
Tassallah Abdullahi
...
Daniel Kirste
Aaron Gokaslan
Mikołaj Glinka
Carsten Eickhoff
Ruben Wolff
ELM
154
1
0
08 Oct 2025
Learning to Rewrite Prompts for Bootstrapping LLMs on Downstream Tasks
Yuwen Tan
Xiang Xiang
Kun He
John E. Hopcroft
116
0
0
08 Oct 2025
OBSR: Open Benchmark for Spatial Representations
Julia Moska
Oleksii Furman
Kacper Kozaczko
Szymon Leszkiewicz
Jakub Polczyk
Piotr Gramacki
Piotr Szymañski
154
0
0
07 Oct 2025
Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Filippo Rinaldi
Aniello Panariello
Giacomo Salici
Fengyuan Liu
Marco Ciccone
Angelo Porrello
Simone Calderara
182
0
0
07 Oct 2025
Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation
Muquan Li
Hang Gou
Dongyang Zhang
Shuang Liang
Xiurui Xie
Deqiang Ouyang
Ke Qin
DD
222
1
0
06 Oct 2025
Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models
Raha Askari
Sina Zarrieß
Özge Alaçam
Judith Sieker
ReLM
197
1
0
06 Oct 2025
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Sara Kangaslahti
Nihal V. Nayak
Jonathan Geuter
Marco Fumero
Francesco Locatello
David Alvarez-Melis
158
0
0
06 Oct 2025
COLE: a Comprehensive Benchmark for French Language Understanding Evaluation
David Beauchemin
Yan Tremblay
Mohamed Amine Youssef
Richard Khoury
ELM
307
1
0
06 Oct 2025
A Set of Quebec-French Corpus of Regional Expressions and Terms
David Beauchemin
Yan Tremblay
Mohamed Amine Youssef
Richard Khoury
135
2
0
06 Oct 2025
SocialNLI: A Dialogue-Centric Social Inference Dataset
Akhil Deo
Kate Sanders
Benjamin Van Durme
142
0
0
06 Oct 2025
Modeling Time Series Dynamics with Fourier Ordinary Differential Equations
Muhao Guo
Yang Weng
AI4TS
156
0
0
05 Oct 2025
Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators
Apurva Badithela
David Snyder
Lihan Zha
Joseph Mikhail
Matthew O'Kelly
Anushri Dixit
Anirudha Majumdar
139
3
0
05 Oct 2025
Previous
1
2
3
4
5
6
...
95
96
97
Next