Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.09418
Cited By
v1
v2 (latest)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 741 papers shown
Compositional Attention: Disentangling Search and Retrieval
Sarthak Mittal
Sharath Chandra Raparthy
Irina Rish
Yoshua Bengio
Guillaume Lajoie
213
20
0
18 Oct 2021
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
209
36
0
16 Oct 2021
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
Mengnan Du
Subhabrata Mukherjee
Yu Cheng
Milad Shokouhi
Helen Zhou
Ahmed Hassan Awadallah
225
18
0
16 Oct 2021
Breaking Down Multilingual Machine Translation
Ting-Rui Chiang
Yi-Pei Chen
Yi-Ting Yeh
Graham Neubig
145
15
0
15 Oct 2021
On the Pitfalls of Analyzing Individual Neurons in Language Models
Omer Antverg
Yonatan Belinkov
MILM
246
62
0
14 Oct 2021
Leveraging redundancy in attention with Reuse Transformers
Srinadh Bhojanapalli
Ayan Chakrabarti
Andreas Veit
Michal Lukasik
Himanshu Jain
Frederick Liu
Yin-Wen Chang
Sanjiv Kumar
158
38
0
13 Oct 2021
Global Vision Transformer Pruning with Hessian-Aware Saliency
Computer Vision and Pattern Recognition (CVPR), 2021
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
213
80
0
10 Oct 2021
Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling
Kyuhong Shim
Iksoo Choi
Wonyong Sung
Jungwook Choi
126
20
0
07 Oct 2021
On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation
Gal Patel
Leshem Choshen
Omri Abend
277
2
0
06 Oct 2021
How BPE Affects Memorization in Transformers
Eugene Kharitonov
Marco Baroni
Dieuwke Hupkes
447
37
0
06 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
429
164
0
05 Oct 2021
On the Prunability of Attention Heads in Multilingual BERT
Aakriti Budhraja
Madhura Pande
Pratyush Kumar
Mitesh M. Khapra
166
5
0
26 Sep 2021
Predicting Attention Sparsity in Transformers
Marcos Vinícius Treviso
António Góis
Patrick Fernandes
E. Fonseca
André F. T. Martins
370
17
0
24 Sep 2021
Grounding Natural Language Instructions: Can Large Language Models Capture Spatial Information?
Julia Rozanova
Deborah Ferreira
K. Dubba
Weiwei Cheng
Dell Zhang
André Freitas
LM&Ro
161
12
0
17 Sep 2021
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
412
58
0
15 Sep 2021
The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders
Han He
Jinho Choi
264
129
0
14 Sep 2021
The Grammar-Learning Trajectories of Neural Language Models
Leshem Choshen
Guy Hacohen
D. Weinshall
Omri Abend
293
35
0
13 Sep 2021
Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions
Javier Ferrando
Marta R. Costa-jussá
124
16
0
13 Sep 2021
GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based on Transformer Networks
Weicheng Ma
Renze Lou
Kai Zhang
Lili Wang
Soroush Vosoughi
179
9
0
13 Sep 2021
Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model
Shaolei Zhang
Yang Feng
247
24
0
11 Sep 2021
Document-level Entity-based Extraction as Template Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Kung-Hsiang Huang
Sam Tang
Nanyun Peng
159
62
0
10 Sep 2021
Block Pruning For Faster Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
246
252
0
10 Sep 2021
Bag of Tricks for Optimizing Transformer Efficiency
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
131
7
0
09 Sep 2021
Transformers in the loop: Polarity in neural models of language
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Lisa Bylinina
Alexey Tikhonov
129
0
0
08 Sep 2021
Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Potsawee Manakul
Mark Gales
114
5
0
08 Sep 2021
Interactively Providing Explanations for Transformer Language Models
Felix Friedrich
P. Schramowski
Christopher Tauchmann
Kristian Kersting
LRM
420
6
0
02 Sep 2021
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
G. Chrysostomou
Nikolaos Aletras
205
22
0
31 Aug 2021
T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Raymond Li
Wen Xiao
Lanjun Wang
Hyeju Jang
Giuseppe Carenini
ViT
161
24
0
31 Aug 2021
Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning
Ran Tian
Joshua Maynez
Ankur P. Parikh
ViT
140
2
0
30 Aug 2021
Layer-wise Model Pruning based on Mutual Information
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Chun Fan
Jiwei Li
Xiang Ao
Leilei Gan
Yuxian Meng
Xiaofei Sun
158
23
0
28 Aug 2021
Fine-Tuning Pretrained Language Models With Label Attention for Biomedical Text Classification
Bruce Nguyen
Shaoxiong Ji
MedIm
200
7
0
26 Aug 2021
Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks
Weicheng Ma
Kai Zhang
Renze Lou
Lili Wang
Soroush Vosoughi
749
22
0
18 Aug 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLM
LM&MA
313
314
0
12 Aug 2021
Adaptive Multi-Resolution Attention with Linear Complexity
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Yao Zhang
Yunpu Ma
T. Seidl
Volker Tresp
118
2
0
10 Aug 2021
Differentiable Subset Pruning of Transformer Heads
Transactions of the Association for Computational Linguistics (TACL), 2021
Jiaoda Li
Robert Bamler
Mrinmaya Sachan
355
63
0
10 Aug 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
Neural Information Processing Systems (NeurIPS), 2021
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
151
38
0
05 Aug 2021
A Dynamic Head Importance Computation Mechanism for Neural Machine Translation
Akshay Goindani
Manish Shrivastava
130
5
0
03 Aug 2021
Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability
Roman Levin
Manli Shu
Eitan Borgnia
Furong Huang
Micah Goldblum
Tom Goldstein
FAtt
AAML
119
12
0
03 Aug 2021
Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
IEEE International Conference on Computer Vision (ICCV), 2021
Ben Saunders
Necati Cihan Camgöz
Richard Bowden
SLR
267
74
0
23 Jul 2021
More Parameters? No Thanks!
Findings (Findings), 2021
Zeeshan Khan
Kartheek Akella
Vinay P. Namboodiri
C. V. Jawahar
110
1
0
20 Jul 2021
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
355
193
0
02 Jul 2021
A Primer on Pretrained Multilingual Language Models
Sumanth Doddapaneni
Gowtham Ramesh
Mitesh M. Khapra
Anoop Kunchukuttan
Pratyush Kumar
LRM
224
87
0
01 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
300
324
0
01 Jul 2021
The MultiBERTs: BERT Reproductions for Robustness Analysis
International Conference on Learning Representations (ICLR), 2021
Thibault Sellam
Steve Yadlowsky
Jason W. Wei
Naomi Saphra
Alexander DÁmour
...
Iulia Turc
Jacob Eisenstein
Dipanjan Das
Ian Tenney
Ellie Pavlick
339
100
0
30 Jun 2021
It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
Findings (Findings), 2021
Alexey Tikhonov
Max Ryabinin
LRM
237
75
0
22 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
190
57
0
19 Jun 2021
Soft Attention: Does it Actually Help to Learn Social Interactions in Pedestrian Trajectory Prediction?
L. Boucaud
Daniel Aloise
Nicolas Saunier
HAI
107
0
0
16 Jun 2021
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
176
82
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
AI Open (AO), 2021
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
392
995
0
14 Jun 2021
Why Can You Lay Off Heads? Investigating How BERT Heads Transfer
Ting-Rui Chiang
Yun-Nung Chen
92
0
0
14 Jun 2021
Previous
1
2
3
...
10
11
12
13
14
15
Next
Page 11 of 15
Page
of 15
Go