Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.09418
Cited By
v1
v2 (latest)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
41 / 741 papers shown
Controlling Computation versus Quality for Neural Sequence Models
Ankur Bapna
N. Arivazhagan
Orhan Firat
226
34
0
17 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
International Conference on Machine Learning (ICML), 2020
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
189
122
0
17 Feb 2020
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
International Conference on Learning Representations (ICLR), 2020
Taeuk Kim
Jihun Choi
Daniel Edmiston
Sang-goo Lee
182
91
0
30 Jan 2020
Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs
Transactions of the Association for Computational Linguistics (TACL), 2020
Leonardo F. R. Ribeiro
Yue Zhang
Claire Gardent
Iryna Gurevych
205
78
0
29 Jan 2020
SANST: A Self-Attentive Network for Next Point-of-Interest Recommendation
Qi Guo
Jianzhong Qi
AI4TS
80
11
0
22 Jan 2020
Block-wise Dynamic Sparseness
Pattern Recognition Letters (Pattern Recognit. Lett.), 2020
Amir Hadifar
Johannes Deleu
Chris Develder
Thomas Demeester
125
3
0
14 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Yanjie Liang
Jialin Li
Jingren Zhou
MQ
218
106
0
13 Jan 2020
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
International Conference on Learning Representations (ICLR), 2019
Karthikeyan K
Zihan Wang
Stephen D. Mayhew
Dan Roth
LRM
293
364
0
17 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
169
3
0
13 Dec 2019
TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un-)Supervised NLP
Nils Rethmeier
V. Saxena
Isabelle Augenstein
FAtt
202
2
0
02 Dec 2019
Do Attention Heads in BERT Track Syntactic Dependencies?
Phu Mon Htut
Jason Phang
Shikha Bordia
Samuel R. Bowman
238
144
0
27 Nov 2019
Graph Transformer for Graph-to-Sequence Learning
AAAI Conference on Artificial Intelligence (AAAI), 2019
Deng Cai
W. Lam
309
246
0
18 Nov 2019
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Timothee Mickus
Denis Paperno
Mathieu Constant
Kees van Deemter
237
49
0
13 Nov 2019
Understanding Multi-Head Attention in Abstractive Summarization
Joris Baan
Maartje ter Hoeve
M. V. D. Wees
Anne Schuth
Maarten de Rijke
AAML
130
23
0
10 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Findings (Findings), 2019
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
309
269
0
07 Nov 2019
Efficiency through Auto-Sizing: Notre Dame NLP's Submission to the WNGT 2019 Efficiency Task
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Kenton W. Murray
Brian DuSell
David Chiang
90
2
0
16 Oct 2019
Structured Pruning of a BERT-based Question Answering Model
J. Scott McCarley
Rishav Chakravarti
Avirup Sil
264
54
0
14 Oct 2019
exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models
Benjamin Hoover
Hendrik Strobelt
Sebastian Gehrmann
125
91
0
11 Oct 2019
Structured Pruning of Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Ziheng Wang
Jeremy Wohlwend
Tao Lei
296
328
0
10 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
International Conference on Learning Representations (ICLR), 2019
Angela Fan
Edouard Grave
Armand Joulin
614
658
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Findings (Findings), 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
604
2,161
0
23 Sep 2019
SANVis: Visual Analytics for Understanding Self-Attention Networks
Visual .. (VISUAL), 2019
Cheonbok Park
Inyoup Na
Yongjang Jo
Sungbok Shin
J. Yoo
Bum Chul Kwon
Jian Zhao
Hyungjong Noh
Yeonsoo Lee
Jaegul Choo
HAI
173
41
0
13 Sep 2019
Multi-Granularity Self-Attention for Neural Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Jie Hao
Xing Wang
Shuming Shi
Jinfeng Zhang
Zhaopeng Tu
MILM
170
50
0
05 Sep 2019
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Elena Voita
Rico Sennrich
Ivan Titov
471
202
0
03 Sep 2019
Rotate King to get Queen: Word Relationships as Orthogonal Transformations in Embedding Space
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Kawin Ethayarajh
LLMSV
147
15
0
02 Sep 2019
Improving Multi-Head Attention with Capsule Networks
Natural Language Processing and Chinese Computing (NLPCC), 2019
Shuhao Gu
Yang Feng
215
14
0
31 Aug 2019
Adaptively Sparse Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
341
277
0
30 Aug 2019
Encoders Help You Disambiguate Word Senses in Neural Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Gongbo Tang
Rico Sennrich
Joakim Nivre
186
22
0
30 Aug 2019
Revealing the Dark Secrets of BERT
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
390
604
0
21 Aug 2019
On Identifiability in Transformers
International Conference on Learning Representations (ICLR), 2019
Gino Brunner
Yang Liu
Damian Pascual
Oliver Richter
Massimiliano Ciaramita
Roger Wattenhofer
ViT
327
202
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
596
2,211
0
09 Aug 2019
Is artificial data useful for biomedical Natural Language Processing algorithms?
Zixu Wang
Julia Ive
S. Velupillai
Lucia Specia
MedIm
138
9
0
01 Jul 2019
Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?
Joris Baan
Maartje ter Hoeve
M. V. D. Wees
Anne Schuth
Maarten de Rijke
163
21
0
01 Jul 2019
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings
Interspeech (Interspeech), 2019
Marcely Zanon Boito
Aline Villavicencio
Laurent Besacier
163
8
0
29 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Transactions of the Association for Computational Linguistics (TACL), 2019
Michael Hahn
352
337
0
16 Jun 2019
A Multiscale Visualization of Attention in the Transformer Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Jesse Vig
ViT
203
658
0
12 Jun 2019
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
614
1,839
0
11 Jun 2019
Analyzing the Structure of Attention in a Transformer Language Model
Jesse Vig
Yonatan Belinkov
271
427
0
07 Jun 2019
Are Sixteen Heads Really Better than One?
Neural Information Processing Systems (NeurIPS), 2019
Paul Michel
Omer Levy
Graham Neubig
MoE
415
1,234
0
25 May 2019
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
414
723
0
05 Apr 2019
Attention in Natural Language Processing
Andrea Galassi
Marco Lippi
Paolo Torroni
GNN
443
555
0
04 Feb 2019
Previous
1
2
3
...
13
14
15
Page 15 of 15
Page
of 15
Go