Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.09418
Cited By
v1
v2 (latest)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 741 papers shown
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
Neural Information Processing Systems (NeurIPS), 2021
Cheng-I Jeff Lai
Yang Zhang
Alexander H. Liu
Shiyu Chang
Yi-Lun Liao
Yung-Sung Chuang
Kaizhi Qian
Sameer Khurana
David D. Cox
James R. Glass
VLM
301
86
0
10 Jun 2021
Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Tyler A. Chang
Yifan Xu
Weijian Xu
Zhuowen Tu
ViT
111
17
0
10 Jun 2021
Patch Slimming for Efficient Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2021
Yehui Tang
Kai Han
Yunhe Wang
Chang Xu
Jianyuan Guo
Chao Xu
Dacheng Tao
ViT
332
194
0
05 Jun 2021
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers
Findings (Findings), 2021
Tianchu Ji
Shraddhan Jain
M. Ferdman
Peter Milder
H. Andrew Schwartz
Niranjan Balasubramanian
MQ
239
20
0
02 Jun 2021
Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?
Findings (Findings), 2021
Min Namgung
Laurent Besacier
Vassilina Nikoulina
D. Schwab
MILM
144
9
0
31 May 2021
Cascaded Head-colliding Attention
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Lin Zheng
Zhiyong Wu
Lingpeng Kong
174
3
0
31 May 2021
Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing
Pattern Recognition Letters (PR), 2021
David Peer
Sebastian Stabinger
Stefan Engl
A. Rodríguez-Sánchez
188
31
0
31 May 2021
On Compositional Generalization of Neural Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Yafu Li
Yongjing Yin
Yulong Chen
Yue Zhang
351
52
0
31 May 2021
On the Interplay Between Fine-tuning and Composition in Transformers
Findings (Findings), 2021
Lang-Chi Yu
Allyson Ettinger
231
14
0
31 May 2021
Cross-Lingual Abstractive Summarization with Limited Parallel Resources
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Yu Bai
Yang Gao
Heyan Huang
209
54
0
28 May 2021
Inspecting the concept knowledge graph encoded by modern language models
Findings (Findings), 2021
Carlos Aspillaga
Marcelo Mendoza
Alvaro Soto
222
15
0
27 May 2021
How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?
Findings (Findings), 2021
Weijia Xu
Shuming Ma
Dongdong Zhang
Marine Carpuat
194
19
0
27 May 2021
LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond
Artificial Intelligence (AI), 2021
Daniel Loureiro
A. Jorge
Jose Camacho-Collados
251
30
0
26 May 2021
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Chen Liang
Simiao Zuo
Minshuo Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
T. Zhao
Weizhu Chen
174
73
0
25 May 2021
A Non-Linear Structural Probe
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Jennifer C. White
Tiago Pimentel
Naomi Saphra
Robert Bamler
144
33
0
21 May 2021
Medical Image Segmentation Using Squeeze-and-Expansion Transformers
International Joint Conference on Artificial Intelligence (IJCAI), 2021
Shaohua Li
Xiuchao Sui
Xiangde Luo
Xinxing Xu
Yong Liu
Rick Siow Mong Goh
ViT
MedIm
163
188
0
20 May 2021
Rationalization through Concepts
Findings (Findings), 2021
Diego Antognini
Boi Faltings
FAtt
214
24
0
11 May 2021
FNet: Mixing Tokens with Fourier Transforms
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
643
641
0
09 May 2021
Long-Span Summarization via Local Attention and Content Selection
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Potsawee Manakul
Mark Gales
231
46
0
08 May 2021
Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses
Transactions of the Association for Computational Linguistics (TACL), 2021
Aina Garí Soler
Marianna Apidianaki
MILM
415
76
0
29 Apr 2021
Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention
Workshop on Cognitive Modeling and Computational Linguistics (CMCL), 2021
S. Ryu
Richard L. Lewis
164
33
0
26 Apr 2021
Easy and Efficient Transformer : Scalable Inference Solution For large NLP model
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
GongZheng Li
Yadong Xi
Jingzhen Ding
Duan Wang
Bai Liu
Changjie Fan
Xiaoxi Mao
Zeng Zhao
262
11
0
26 Apr 2021
Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
International Conference on Artificial Neural Networks (ICANN), 2021
Cheng Chen
Yichun Yin
Lifeng Shang
Zhi Wang
Xin Jiang
Xiao Chen
Qun Liu
FedML
139
9
0
24 Apr 2021
Code Structure Guided Transformer for Source Code Summarization
ACM Transactions on Software Engineering and Methodology (TOSEM), 2021
Shuzheng Gao
Cuiyun Gao
Yulan He
Jichuan Zeng
L. Nie
Xin Xia
Michael R. Lyu
213
119
0
19 Apr 2021
BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with Assembly Models
International Workshop on Semantic Evaluation (SemEval), 2021
A. Islam
Weicheng Ma
Soroush Vosoughi
125
4
0
19 Apr 2021
Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Mozhdeh Gheini
Xiang Ren
Jonathan May
LRM
311
162
0
18 Apr 2021
Knowledge Neurons in Pretrained Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
547
577
0
18 Apr 2021
Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Dongkuan Xu
Ian En-Hsu Yen
Jinxi Zhao
Zhibin Xiao
VLM
AAML
193
66
0
18 Apr 2021
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Fangyu Liu
Ivan Vulić
Anna Korhonen
Nigel Collier
VLM
OffRL
324
132
0
16 Apr 2021
Effect of Post-processing on Contextualized Word Representations
International Conference on Computational Linguistics (COLING), 2021
Hassan Sajjad
Firoj Alam
Fahim Dalvi
Nadir Durrani
173
12
0
15 Apr 2021
Sparse Attention with Linear Units
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Biao Zhang
Ivan Titov
Rico Sennrich
263
52
0
14 Apr 2021
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
Journal of Artificial Intelligence Research (JAIR), 2021
Danielle Saunders
AI4CE
362
107
0
14 Apr 2021
DirectProbe: Studying Representations without Classifiers
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Yichu Zhou
Vivek Srikumar
219
36
0
13 Apr 2021
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Zhen Wu
Lijun Wu
Qi Meng
Ziheng Lu
Shufang Xie
Tao Qin
Xinyu Dai
Tie-Yan Liu
208
25
0
11 Apr 2021
On Biasing Transformer Attention Towards Monotonicity
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Annette Rios Gonzales
Chantal Amrhein
Noëmi Aepli
Rico Sennrich
135
9
0
08 Apr 2021
How Transferable are Reasoning Patterns in VQA?
Computer Vision and Pattern Recognition (CVPR), 2021
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
149
29
0
08 Apr 2021
Attention Head Masking for Inference Time Content Selection in Abstractive Summarization
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Shuyang Cao
Lu Wang
CVBM
129
15
0
06 Apr 2021
Efficient Attentions for Long Document Summarization
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
L. Huang
Shuyang Cao
Nikolaus Nova Parulian
Heng Ji
Lu Wang
330
361
0
05 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2021
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
301
32
0
02 Apr 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Hila Chefer
Shir Gur
Lior Wolf
ViT
358
412
0
29 Mar 2021
Learning on heterogeneous graphs using high-order relations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
See Hian Lee
Feng Ji
Wee Peng Tay
116
4
0
29 Mar 2021
Dodrio: Exploring Transformer Models with Interactive Visualization
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Zijie J. Wang
Robert Turko
Duen Horng Chau
202
46
0
26 Mar 2021
Understanding Robustness of Transformers for Image Classification
IEEE International Conference on Computer Vision (ICCV), 2021
Srinadh Bhojanapalli
Ayan Chakrabarti
Daniel Glasner
Daliang Li
Thomas Unterthiner
Andreas Veit
ViT
313
472
0
26 Mar 2021
Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Shuhao Gu
Yang Feng
Wanying Xie
CLL
AI4CE
192
32
0
25 Mar 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
AAAI Conference on Artificial Intelligence (AAAI), 2021
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
202
30
0
24 Mar 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
IEEE Access (IEEE Access), 2021
Sushant Singh
A. Mahmood
AI4TS
325
120
0
23 Mar 2021
Learning Calibrated-Guidance for Object Detection in Aerial Images
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (J-STARS), 2021
Zongqi Wei
Dong Liang
Dong Zhang
Liyan Zhang
Qixiang Geng
Mingqiang Wei
Huiyu Zhou
326
38
0
21 Mar 2021
Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond
Knowledge and Information Systems (KAIS), 2021
Xuhong Li
Haoyi Xiong
Xingjian Li
Xuanyu Wu
Xiao Zhang
Ji Liu
Jiang Bian
Dejing Dou
AAML
FaML
XAI
HAI
294
440
0
19 Mar 2021
Approximating How Single Head Attention Learns
Charles Burton Snell
Ruiqi Zhong
Dan Klein
Jacob Steinhardt
MLT
169
33
0
13 Mar 2021
An empirical analysis of phrase-based and neural machine translation
Hamidreza Ghader
115
1
0
04 Mar 2021
Previous
1
2
3
...
11
12
13
14
15
Next
Page 12 of 15
Page
of 15
Go