Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.09418
Cited By
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
19 / 169 papers shown
Title
Time-based Sequence Model for Personalization and Recommendation Systems
T. Ishkhanov
Maxim Naumov
Xianjie Chen
Yan Zhu
Yuan Zhong
A. Azzolini
Chonglin Sun
Frank Jiang
Andrey Malevich
Liang Xiong
13
16
0
27 Aug 2020
BERTology Meets Biology: Interpreting Attention in Protein Language Models
Jesse Vig
Ali Madani
L. Varshney
Caiming Xiong
R. Socher
Nazneen Rajani
15
288
0
26 Jun 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
25
63
0
16 Jun 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
6
363
0
27 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
33
167
0
16 Apr 2020
Information-Theoretic Probing with Minimum Description Length
Elena Voita
Ivan Titov
19
269
0
27 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,450
0
18 Mar 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
22
92
0
24 Feb 2020
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Taeuk Kim
Jihun Choi
Daniel Edmiston
Sang-goo Lee
22
90
0
30 Jan 2020
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Karthikeyan K
Zihan Wang
Stephen D. Mayhew
Dan Roth
LRM
25
334
0
17 Dec 2019
Graph Transformer for Graph-to-Sequence Learning
Deng Cai
W. Lam
16
220
0
18 Nov 2019
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Timothee Mickus
Denis Paperno
Mathieu Constant
Kees van Deemter
18
45
0
13 Nov 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
22
584
0
25 Sep 2019
SANVis: Visual Analytics for Understanding Self-Attention Networks
Cheonbok Park
Inyoup Na
Yongjang Jo
Sungbok Shin
J. Yoo
Bum Chul Kwon
Jian Zhao
Hyungjong Noh
Yeonsoo Lee
Jaegul Choo
HAI
27
38
0
13 Sep 2019
On Identifiability in Transformers
Gino Brunner
Yang Liu
Damian Pascual
Oliver Richter
Massimiliano Ciaramita
Roger Wattenhofer
ViT
22
186
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
35
1,912
0
09 Aug 2019
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings
Marcely Zanon Boito
Aline Villavicencio
Laurent Besacier
15
8
0
29 Jun 2019
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
37
1,578
0
11 Jun 2019
Analyzing the Structure of Attention in a Transformer Language Model
Jesse Vig
Yonatan Belinkov
19
357
0
07 Jun 2019
Previous
1
2
3
4