Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.09418
Cited By
v1
v2 (latest)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 741 papers shown
Unbiased Sentence Encoder For Large-Scale Multi-lingual Search Engines
Mahdi Hajiaghayi
Monir Hajiaghayi
Mark R. Bolin
107
0
0
01 Mar 2021
CNN with large memory layers
R. Karimov
Yury Malkov
Karim Iskakov
Victor Lempitsky
223
0
0
27 Jan 2021
Attention Can Reflect Syntactic Structure (If You Let It)
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021
Vinit Ravishankar
Artur Kulmizev
Mostafa Abdou
Anders Søgaard
Joakim Nivre
121
37
0
26 Jan 2021
The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT
AAAI Conference on Artificial Intelligence (AAAI), 2021
Madhura Pande
Aakriti Budhraja
Preksha Nema
Pratyush Kumar
Mitesh M. Khapra
180
20
0
22 Jan 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Computer Vision and Pattern Recognition (CVPR), 2021
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
242
190
0
21 Jan 2021
Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks
Machine Intelligence Research (MIR), 2021
Zhengyan Zhang
Guangxuan Xiao
Yongwei Li
Tian Lv
Fanchao Qi
Zhiyuan Liu
Yasheng Wang
Xin Jiang
Maosong Sun
AAML
311
83
0
18 Jan 2021
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization
Jing Jin
Cai Liang
Tiancheng Wu
Li Zou
Zhiliang Gan
MQ
190
28
0
15 Jan 2021
Of Non-Linearity and Commutativity in BERT
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Sumu Zhao
Damian Pascual
Gino Brunner
Roger Wattenhofer
309
18
0
12 Jan 2021
Transformers in Vision: A Survey
ACM Computing Surveys (CSUR), 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
924
3,176
0
04 Jan 2021
On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification
Zhengxuan Wu
Desmond C. Ong
182
29
0
01 Jan 2021
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Xiaohan Chen
Yu Cheng
Shuohang Wang
Zhe Gan
Zinan Lin
Jingjing Liu
433
104
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
622
1,159
0
29 Dec 2020
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning
International Conference on Learning Representations (ICLR), 2020
Xuebo Liu
Longyue Wang
Yang Li
Liang Ding
Lidia S. Chao
Zhaopeng Tu
AI4CE
202
37
0
29 Dec 2020
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Lei Li
Yankai Lin
Deli Chen
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
246
59
0
29 Dec 2020
Multi-Head Self-Attention with Role-Guided Masks
European Conference on Information Retrieval (ECIR), 2020
Dongsheng Wang
Casper Hansen
Lucas Chaves Lima
Christian B. Hansen
Maria Maistro
J. Simonsen
Christina Lioma
223
3
0
22 Dec 2020
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
International Symposium on High-Performance Computer Architecture (HPCA), 2020
Hanrui Wang
Zhekai Zhang
Song Han
493
495
0
17 Dec 2020
Transformer Interpretability Beyond Attention Visualization
Computer Vision and Pattern Recognition (CVPR), 2020
Hila Chefer
Shir Gur
Lior Wolf
421
872
0
17 Dec 2020
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks
International Conference on Language Resources and Evaluation (LREC), 2020
Ileana Rugina
Rumen Dangovski
L. Jing
Preslav Nakov
Marin Soljacic
277
0
0
20 Nov 2020
On the Dynamics of Training Attention Models
International Conference on Learning Representations (ICLR), 2020
Haoye Lu
Yongyi Mao
A. Nayak
148
9
0
19 Nov 2020
Positional Artefacts Propagate Through Masked Language Model Embeddings
Ziyang Luo
Artur Kulmizev
Xiaoxi Mao
305
41
0
09 Nov 2020
Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models
Shucong Zhang
Erfan Loweimi
P. Bell
Steve Renals
222
0
0
08 Nov 2020
Rethinking the Value of Transformer Components
Wenxuan Wang
Zhaopeng Tu
141
47
0
07 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
165
34
0
07 Nov 2020
How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT
′
'
′
s Attention
International Conference on Computational Linguistics (COLING), 2020
Yue Guan
Jingwen Leng
Chao Li
Quan Chen
Minyi Guo
169
19
0
02 Nov 2020
Influence Patterns for Explaining Information Flow in BERT
Neural Information Processing Systems (NeurIPS), 2020
Kaiji Lu
Zifan Wang
Piotr (Peter) Mardziel
Anupam Datta
GNN
243
19
0
02 Nov 2020
Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation
International Conference on Computational Linguistics (COLING), 2020
Shuhao Gu
Yang Feng
CLL
227
27
0
02 Nov 2020
Syllabification of the Divine Comedy
ACM Journal on Computing and Cultural Heritage (JOCCH), 2020
Andrea Asperti
S. Bianco
245
2
0
26 Oct 2020
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
Young Jin Kim
Hany Awadalla
AI4CE
164
48
0
26 Oct 2020
Rethinking embedding coupling in pre-trained language models
International Conference on Learning Representations (ICLR), 2020
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
313
169
0
24 Oct 2020
Not all parameters are born equal: Attention is mostly what you need
Nikolay Bogoychev
MoE
170
9
0
22 Oct 2020
Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation
Elena Voita
Rico Sennrich
Ivan Titov
293
87
0
21 Oct 2020
Generating Diverse Translation from Model Distribution with Dropout
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Xuanfu Wu
Yang Feng
Chenze Shao
98
14
0
16 Oct 2020
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko
Daniil Gavrilov
Nikita Balagansky
Pavel Kalaidin
201
1
0
14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
796
699
0
13 Oct 2020
Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations
Nikolaos Manginas
Ilias Chalkidis
Prodromos Malakasiotis
114
5
0
12 Oct 2020
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?
Jasmijn Bastings
Katja Filippova
XAI
LRM
268
201
0
12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
238
49
0
11 Oct 2020
FIND: Human-in-the-Loop Debugging Deep Text Classifiers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Piyawat Lertvittayakumjorn
Lucia Specia
Francesca Toni
236
57
0
10 Oct 2020
Structured Self-Attention Weights Encode Semantics in Sentiment Analysis
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2020
Zhengxuan Wu
Thanh-Son Nguyen
Desmond C. Ong
MILM
186
22
0
10 Oct 2020
Intrinsic Probing through Dimension Selection
Lucas Torroba Hennigen
Adina Williams
Robert Bamler
200
60
0
06 Oct 2020
BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking Scalar Adjectives with Contextualised Representations
Aina Garí Soler
Marianna Apidianaki
99
23
0
06 Oct 2020
On the Sub-Layer Functionalities of Transformer Decoder
Findings (Findings), 2020
Yilin Yang
Longyue Wang
Shuming Shi
Prasad Tadepalli
Stefan Lee
Zhaopeng Tu
230
28
0
06 Oct 2020
Efficient Inference For Neural Machine Translation
Y. Hsu
Sarthak Garg
Yi-Hsiu Liao
Ilya Chatsviorkin
AI4CE
113
13
0
06 Oct 2020
Guiding Attention for Self-Supervised Learning with Transformers
Findings (Findings), 2020
Ameet Deshpande
Karthik Narasimhan
157
22
0
06 Oct 2020
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Findings (Findings), 2020
Zi Lin
Jeremiah Zhe Liu
Ziao Yang
Nan Hua
Dan Roth
200
49
0
05 Oct 2020
Syntax Representation in Word Embeddings and Neural Networks -- A Survey
Conference on Theory and Practice of Information Technologies (TPIT), 2020
Tomasz Limisiewicz
David Marecek
NAI
188
9
0
02 Oct 2020
AUBER: Automated BERT Regularization
Hyun Dong Lee
Seongmin Lee
U. Kang
137
9
0
30 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
287
225
0
27 Sep 2020
On the Ability and Limitations of Transformers to Recognize Formal Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
S. Bhattamishra
Kabir Ahuja
Navin Goyal
233
11
0
23 Sep 2020
Alleviating the Inequality of Attention Heads for Neural Machine Translation
International Conference on Computational Linguistics (COLING), 2020
Zewei Sun
Shujian Huang
Xinyu Dai
Jiajun Chen
216
7
0
21 Sep 2020
Previous
1
2
3
...
12
13
14
15
Next
Page 13 of 15
Page
of 15
Go