Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.09418
Cited By
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 169 papers shown
Title
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
Mamba
48
0
0
13 May 2025
Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation
Chiara Manna
Afra Alishahi
Frédéric Blain
Eva Vanmassenhove
22
0
0
13 May 2025
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
Sehyeong Jo
Gangjae Jang
Haesol Park
32
0
0
28 Apr 2025
Hallucination Detection in LLMs via Topological Divergence on Attention Graphs
Alexandra Bazarova
Aleksandr Yugay
Andrey Shulga
A. Ermilova
Andrei Volodichev
...
Dmitry Simakov
M. Savchenko
Andrey Savchenko
Serguei Barannikov
Alexey Zaytsev
HILM
28
0
0
14 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing
Han Liao
Shuaishuai Zu
38
0
0
11 Apr 2025
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
Pedro Sandoval-Segura
Xijun Wang
Ashwinee Panda
Micah Goldblum
Ronen Basri
Tom Goldstein
David Jacobs
17
0
0
04 Apr 2025
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles
Chen Wei Kuo
Kevin Chu
Nouar Aldahoul
Hazem Ibrahim
Talal Rahwan
Yasir Zaki
SyDa
54
0
0
04 Apr 2025
Language Models at the Syntax-Semantics Interface: A Case Study of the Long-Distance Binding of Chinese Reflexive ziji
Xiulin Yang
35
0
0
02 Apr 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
45
0
0
14 Mar 2025
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Itay Benou
Tammy Riklin-Raviv
62
0
0
27 Feb 2025
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Angelica I Aviles-Rivero
Chuanlong Xie
Yao Zhu
123
1
0
26 Feb 2025
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
86
3
0
24 Feb 2025
EvoP: Robust LLM Inference via Evolutionary Pruning
Shangyu Wu
Hongchao Du
Ying Xiong
Shuai Chen
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
34
1
0
19 Feb 2025
Exploring Translation Mechanism of Large Language Models
Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang
59
1
0
17 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
155
0
0
04 Feb 2025
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Yuan Feng
Junlin Lv
Yukun Cao
Xike Xie
S. K. Zhou
VLM
53
27
0
28 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
50
0
0
10 Jan 2025
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
38
0
0
08 Jan 2025
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Zhibo Wang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
54
3
0
17 Nov 2024
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
48
1
0
31 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
29
13
0
15 Oct 2024
Token Pruning using a Lightweight Background Aware Vision Transformer
Sudhakar Sah
Ravish Kumar
Honnesh Rohmetra
Ehsan Saboori
ViT
21
0
0
12 Oct 2024
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Jian Gao
Xiao Zhang
Ji Wu
Miao Li
38
0
0
26 Sep 2024
Explanation Bottleneck Models
Shinýa Yamaguchi
Kosuke Nishida
LRM
BDL
49
1
0
26 Sep 2024
Collaborative Learning for Enhanced Unsupervised Domain Adaptation
Minhee Cho
Hyesong Choi
Hayeon Jo
Dongbo Min
25
1
0
04 Sep 2024
Explainable Artificial Intelligence: A Survey of Needs, Techniques, Applications, and Future Direction
Melkamu Mersha
Khang Lam
Joseph Wood
Ali AlShami
Jugal Kalita
XAI
AI4TS
67
28
0
30 Aug 2024
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLM
ViT
34
6
0
05 Jul 2024
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
44
2
0
02 Jul 2024
Inpainting the Gaps: A Novel Framework for Evaluating Explanation Methods in Vision Transformers
Lokesh Badisa
Sumohana S. Channappayya
40
0
0
17 Jun 2024
Attention as a Hypernetwork
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
37
3
0
09 Jun 2024
Interpreting the Second-Order Effects of Neurons in CLIP
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
MILM
54
16
0
06 Jun 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
42
6
0
24 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
43
2
0
22 May 2024
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis
Yao Fu
22
19
0
14 May 2024
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Abhinav Agarwalla
Abhay Gupta
Alexandre Marques
Shubhra Pandit
Michael Goin
...
Tuan Nguyen
Mahmoud Salem
Dan Alistarh
Sean Lie
Mark Kurtz
MoE
SyDa
38
11
0
06 May 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
41
79
0
26 Mar 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
26
1
0
13 Mar 2024
Where does In-context Translation Happen in Large Language Models
Suzanna Sia
David Mueller
Kevin Duh
LRM
33
0
0
07 Mar 2024
Evaluating Webcam-based Gaze Data as an Alternative for Human Rationale Annotations
Stephanie Brandl
Oliver Eberle
Tiago F. R. Ribeiro
Anders Søgaard
Nora Hollenstein
38
1
0
29 Feb 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
Amit Dhurandhar
Tejaswini Pedapati
Ronny Luss
Soham Dan
Aurélie C. Lozano
Payel Das
Georgios Kollias
22
3
0
28 Feb 2024
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
T. Yasuda
Kyriakos Axiotis
Gang Fu
M. Bateni
Vahab Mirrokni
39
0
0
27 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
39
121
0
26 Jan 2024
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
Wang Chao
Jiaxuan Zhao
Licheng Jiao
Lingling Li
Fang Liu
Shuyuan Yang
64
13
0
19 Jan 2024
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
29
2
0
08 Nov 2023
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
Yifan Hou
Jiaoda Li
Yu Fei
Alessandro Stolfo
Wangchunshu Zhou
Guangtao Zeng
Antoine Bosselut
Mrinmaya Sachan
LRM
30
39
0
23 Oct 2023
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning
Chong Li
Shaonan Wang
Yunhao Zhang
Jiajun Zhang
Chengqing Zong
27
4
0
16 Oct 2023
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Yu-xin Zhang
Lirui Zhao
Mingbao Lin
Yunyun Sun
Yiwu Yao
Xingjia Han
Jared Tanner
Shiwei Liu
Rongrong Ji
SyDa
37
40
0
13 Oct 2023
Evaluating Explanation Methods for Vision-and-Language Navigation
Guanqi Chen
Lei Yang
Guanhua Chen
Jia Pan
XAI
21
0
0
10 Oct 2023
Image-level supervision and self-training for transformer-based cross-modality tumor segmentation
Malo de Boisredon
Eugene Vorontsov
W. Le
Samuel Kadoury
MedIm
ViT
25
0
0
17 Sep 2023
Instruction Position Matters in Sequence Generation with Large Language Models
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
LRM
41
8
0
23 Aug 2023
1
2
3
4
Next