Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.09418
Cited By
v1
v2 (latest)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 741 papers shown
Under the Shadow of Babel: How Language Shapes Reasoning in LLMs
Chenxi Wang
Y. Zhang
Lang Gao
Zixiang Xu
Zirui Song
Zixiang Xu
Xiuying Chen
159
1
0
19 Jun 2025
Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Reduan Achtibat
Patrick Kahardipraja
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
253
2
0
16 Jun 2025
The Synthetic Mirror -- Synthetic Data at the Age of Agentic AI
Marcelle Momha
208
1
0
15 Jun 2025
A correlation-permutation approach for speech-music encoders model merging
Fabian Ritter-Gutierrez
Yi-Cheng Lin
Jeremy H.M Wong
Hung-yi Lee
Eng Siong Chng
Nancy F. Chen
MoMe
265
2
0
13 Jun 2025
United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory
HaoYang Shang
Xuan Liu
Zi Liang
J. Zhang
Haibo Hu
Song Guo
LLMAG
246
5
0
07 Jun 2025
Relational reasoning and inductive bias in transformers trained on a transitive inference task
J. Geerts
Stephanie Chan
Claudia Clopath
Kimberly L. Stachenfeld
LRM
196
2
0
04 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
467
0
0
04 Jun 2025
It's Not a Walk in the Park! Challenges of Idiom Translation in Speech-to-text Systems
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Iuliia Zaitova
Badr M. Abdullah
Wei Xue
Dietrich Klakow
Bernd Möbius
T. Avgustinova
175
1
0
03 Jun 2025
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
Woomin Song
Sai Muralidhar Jayanthi
S. Ronanki
Kanthashree Mysore Sathyendra
Jinwoo Shin
Aram Galstyan
Shubham Katiyar
S. Bodapati
VLM
356
0
0
01 Jun 2025
Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective
Lei Lei
Jie Gu
Xiaokang Ma
Chu Tang
Jingmin Chen
Tong Xu
252
1
0
01 Jun 2025
Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection
Yeshwanth Venkatesha
Souvik Kundu
Priyadarshini Panda
170
1
0
31 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
365
1
0
27 May 2025
Relevance-driven Input Dropout: an Explanation-guided Regularization Technique
Shreyas Gururaj
Lars Grüne
Wojciech Samek
Sebastian Lapuschkin
Leander Weber
408
1
0
27 May 2025
How Syntax Specialization Emerges in Language Models
Xufeng Duan
Zhaoqian Yao
Yunhao Zhang
Shaonan Wang
Zhenguang G. Cai
MILM
LRM
211
5
0
26 May 2025
Response Uncertainty and Probe Modeling: Two Sides of the Same Coin in LLM Interpretability?
Yongjie Wang
Yibo Wang
Xin Zhou
Zhiqi Shen
217
1
0
24 May 2025
ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models
Hao Chen
Haoze Li
Zhiqing Xiao
Lirong Gao
Qi Zhang
Xiaomeng Hu
Ningtao Wang
Xing Fu
Junbo Zhao
593
0
0
24 May 2025
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs
Lucas Bandarkar
Nanyun Peng
MoMe
LRM
317
1
0
23 May 2025
Multi-Scale Probabilistic Generation Theory: A Unified Information-Theoretic Framework for Hierarchical Structure in Large Language Models
Yukin Zhang
Qi Dong
290
0
0
23 May 2025
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Chaerin Kong
Jiho Jang
Nojun Kwak
462
0
0
22 May 2025
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression
Sreetama Sarkar
Yue Che
Alex Gavin
Peter A. Beerel
Souvik Kundu
MLLM
VLM
258
7
0
22 May 2025
SUS backprop: linear backpropagation algorithm for long inputs in transformers
Sergey Pankov
Georges Harik
329
0
0
21 May 2025
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation
Patrick Kahardipraja
Reduan Achtibat
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
357
4
0
21 May 2025
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
Ingeol Baek
Hwan Chang
Sunghyun Ryu
Hwanhee Lee
197
2
0
21 May 2025
PiT: Progressive Diffusion Transformer
Jiafu Wu
Yabiao Wang
Jian Li
Jinlong Peng
Yun Cao
Chengjie Wang
Jiangning Zhang
616
0
0
19 May 2025
K
K
K
-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks
Pratim Chowdhary
Peter Chin
Deepernab Chakrabarty
282
0
0
18 May 2025
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
Mamba
468
3
0
13 May 2025
Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation
Chiara Manna
Afra Alishahi
Frédéric Blain
Eva Vanmassenhove
339
3
0
13 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
906
40
0
10 May 2025
Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces
Michael Pichat
William Pogrund
Paloma Pichat
Judicael Poumay
Armanouche Gasparian
Samuel Demarchi
Martin Corbet
Alois Georgeon
Michael Veillet-Guillem
MILM
293
0
0
30 Apr 2025
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
International Conference on Information Photonics (ICIP), 2025
Sehyeong Jo
Gangjae Jang
Haesol Park
370
6
0
28 Apr 2025
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings
Saniya Karwa
Navpreet Singh
CoGe
274
2
0
20 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing
Han Liao
Shuaishuai Zu
396
1
0
11 Apr 2025
A Meaningful Perturbation Metric for Evaluating Explainability Methods
Scandinavian Conference on Image Analysis (SCIA), 2025
Danielle Cohen
Hila Chefer
Lior Wolf
AAML
204
1
0
09 Apr 2025
Activation Patching for Interpretable Steering in Music Generation
Simone Facchiano
Giorgio Strano
Donato Crisostomi
Irene Tallini
Tommaso Mencattini
Fabio Galasso
Emanuele Rodolà
LLMSV
224
2
0
06 Apr 2025
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles
Chen Wei Kuo
Kevin Chu
Nouar Aldahoul
Hazem Ibrahim
Talal Rahwan
Yasir Zaki
SyDa
384
0
0
04 Apr 2025
Identifying and Evaluating Inactive Heads in Pretrained LLMs
Pedro Sandoval-Segura
Xijun Wang
Ashwinee Panda
Micah Goldblum
Ronen Basri
Tom Goldstein
David Jacobs
444
1
0
04 Apr 2025
Language Models at the Syntax-Semantics Interface: A Case Study of the Long-Distance Binding of Chinese Reflexive ziji
International Conference on Computational Linguistics (COLING), 2025
Xiulin Yang
348
2
0
02 Apr 2025
Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition
ACM Multimedia (MM), 2024
Muxin Pu
Mei Kuan Lim
Chun Yong Chong
SLR
244
8
0
26 Mar 2025
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Shihao Zhou
Dayu Li
Jinshan Pan
Juncheng Zhou
Jinglei Shi
Jufeng Yang
307
1
0
26 Mar 2025
Linguistic Blind Spots of Large Language Models
Jiali Cheng
Hadi Amiri
361
1
0
25 Mar 2025
Efficient Knowledge Distillation via Curriculum Extraction
Shivam Gupta
Sushrut Karmalkar
348
2
0
21 Mar 2025
Intra-neuronal attention within language models Relationships between activation and semantics
Michael Pichat
William Pogrund
Paloma Pichat
Armanouche Gasparian
Samuel Demarchi
Corbet Alois Georgeon
Michael Veillet-Guillem
MILM
259
0
0
17 Mar 2025
Mixed-granularity Implicit Representation for Continuous Hyperspectral Compressive Reconstruction
IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025
Jianan Li
Huan Chen
Wangcai Zhao
Rui Chen
Tingfa Xu
259
3
0
17 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
358
1
0
17 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
538
3
0
14 Mar 2025
ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
Xin Liu
Xudong Wang
Pei Liu
Guoming Tang
MoMe
277
0
0
13 Mar 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
Jiajun Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
431
4
0
13 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
304
31
0
08 Mar 2025
A Theory of Initialisation's Impact on Specialisation
International Conference on Learning Representations (ICLR), 2025
Devon Jarvis
Sebastian Lee
Clémentine Dominé
Andrew M. Saxe
Stefano Sarao Mannelli
CLL
297
2
0
04 Mar 2025
AxBERT: An Interpretable Chinese Spelling Correction Method Driven by Associative Knowledge Network
Fanyu Wang
Hangyu Zhu
Zhenping Xie
224
0
0
04 Mar 2025
Previous
1
2
3
4
5
...
13
14
15
Next
Page 2 of 15
Page
of 15
Go