Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.09418
Cited By
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 169 papers shown
Title
Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques
Daking Rai
Bailin Wang
Yilun Zhou
Ziyu Yao
25
27
0
27 May 2023
Attention Mixtures for Time-Aware Sequential Recommendation
Viet-Anh Tran
Guillaume Salha-Galvan
Bruno Sguerra
Romain Hennequin
28
21
0
17 Apr 2023
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Daniel Fernando Campos
Alexandre Marques
Mark Kurtz
Chengxiang Zhai
VLM
AAML
11
2
0
30 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
20
41
0
10 Mar 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging
J. Heo
Seyedarmin Azizi
A. Fayyazi
Massoud Pedram
36
3
0
04 Mar 2023
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models
Mohammadreza Banaei
Klaudia Bałazy
Artur Kasymov
R. Lebret
Jacek Tabor
Karl Aberer
OffRL
19
0
0
08 Feb 2023
Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Kyuhong Shim
Jungwook Choi
Wonyong Sung
ViT
17
3
0
29 Jan 2023
Holistically Explainable Vision Transformers
Moritz D Boehle
Mario Fritz
Bernt Schiele
ViT
33
9
0
20 Jan 2023
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
24
2
0
20 Dec 2022
Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model
Yeskendir Koishekenov
Alexandre Berard
Vassilina Nikoulina
MoE
30
29
0
19 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
24
2
0
06 Dec 2022
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
A. Deshpande
Md Arafat Sultan
Anthony Ferritto
A. Kalyan
Karthik Narasimhan
Avirup Sil
MoE
33
1
0
29 Nov 2022
Explanation on Pretraining Bias of Finetuned Vision Transformer
Bumjin Park
Jaesik Choi
ViT
29
1
0
18 Nov 2022
Compressing Transformer-based self-supervised models for speech processing
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
32
6
0
17 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Z. Yao
Xiaoxia Wu
Conglong Li
Connor Holmes
Minjia Zhang
Cheng-rong Li
Yuxiong He
25
11
0
17 Nov 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Xiaozhi Wang
Kaiyue Wen
Zhengyan Zhang
Lei Hou
Zhiyuan Liu
Juanzi Li
MILM
MoE
19
50
0
14 Nov 2022
ViT-CX: Causal Explanation of Vision Transformers
Weiyan Xie
Xiao-hui Li
Caleb Chen Cao
Nevin L.Zhang
ViT
24
17
0
06 Nov 2022
Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks
Rochelle Choenni
Dan Garrette
Ekaterina Shutova
24
2
0
31 Oct 2022
Is Encoder-Decoder Redundant for Neural Machine Translation?
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
19
4
0
21 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning
Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
32
16
0
18 Oct 2022
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
28
417
0
17 Oct 2022
Parameter-Efficient Tuning with Special Token Adaptation
Xiaoocong Yang
James Y. Huang
Wenxuan Zhou
Muhao Chen
26
12
0
10 Oct 2022
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Yuxuan Li
James L. McClelland
33
17
0
02 Oct 2022
Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks
Xiaofeng Lei
Shaohua Li
Xinxing Xu
H. Fu
Yong Liu
...
Mingrui Tan
Yanyu Xu
Jocelyn Hui Lin Goh
Rick Siow Mong Goh
Ching-Yu Cheng
21
1
0
25 Sep 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
458
0
24 Sep 2022
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
24
11
0
20 Sep 2022
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
19
83
0
06 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
Jou-An Chen
Wei Niu
Bin Ren
Yanzhi Wang
Xipeng Shen
23
24
0
29 Aug 2022
Probing via Prompting
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
29
13
0
04 Jul 2022
The Topological BERT: Transforming Attention into Topology for Natural Language Processing
Ilan Perez
Raphael Reinauer
22
17
0
30 Jun 2022
Optimizing Relevance Maps of Vision Transformers Improves Robustness
Hila Chefer
Idan Schwartz
Lior Wolf
ViT
29
37
0
02 Jun 2022
Lack of Fluency is Hurting Your Translation Model
J. Yoo
Jaewoo Kang
18
0
0
24 May 2022
Life after BERT: What do Other Muppets Understand about Language?
Vladislav Lialin
Kevin Zhao
Namrata Shivagunde
Anna Rumshisky
39
6
0
21 May 2022
Adaptable Adapters
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
48
21
0
03 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
36
149
0
27 Apr 2022
Merging of neural networks
Martin Pasen
Vladimír Boza
FedML
MoMe
30
2
0
21 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
25
9
0
06 Apr 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Nishant Kambhatla
Logan Born
Anoop Sarkar
6
16
0
01 Apr 2022
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
Ziqing Yang
Yiming Cui
Zhigang Chen
SyDa
VLM
17
12
0
30 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar S. Karnin
17
14
0
27 Mar 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
38
98
0
24 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
26
5
0
20 Mar 2022
A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News Classification
Dairui Liu
Derek Greene
Ruihai Dong
23
10
0
14 Mar 2022
Data-Efficient Structured Pruning via Submodular Optimization
Marwa El Halabi
Suraj Srinivas
Simon Lacoste-Julien
18
18
0
09 Mar 2022
Understanding microbiome dynamics via interpretable graph representation learning
K. Melnyk
Kuba Weimann
Tim Conrad
19
5
0
02 Mar 2022
XAI for Transformers: Better Explanations through Conservative Propagation
Ameen Ali
Thomas Schnake
Oliver Eberle
G. Montavon
Klaus-Robert Muller
Lior Wolf
FAtt
15
88
0
15 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
21
15
0
11 Feb 2022
Can Model Compression Improve NLP Fairness
Guangxuan Xu
Qingyuan Hu
23
26
0
21 Jan 2022
Previous
1
2
3
4
Next