v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

41 / 741 papers shown

Controlling Computation versus Quality for Neural Sequence Models

Ankur Bapna

N. Arivazhagan

Orhan Firat

226

17 Feb 2020

Low-Rank Bottleneck in Multi-head Attention ModelsInternational Conference on Machine Learning (ICML), 2020

Srinadh Bhojanapalli

Chulhee Yun

A. S. Rawat

Sashank J. Reddi

Sanjiv Kumar

189

122

17 Feb 2020

Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar InductionInternational Conference on Learning Representations (ICLR), 2020

182

30 Jan 2020

Modeling Global and Local Node Contexts for Text Generation from Knowledge GraphsTransactions of the Association for Computational Linguistics (TACL), 2020

Leonardo F. R. Ribeiro

Yue Zhang

Claire Gardent

Iryna Gurevych

205

29 Jan 2020

SANST: A Self-Attentive Network for Next Point-of-Interest Recommendation

Qi Guo

Jianzhong Qi

AI4TS

22 Jan 2020

Block-wise Dynamic SparsenessPattern Recognition Letters (Pattern Recognit. Lett.), 2020

125

14 Jan 2020

AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture SearchInternational Joint Conference on Artificial Intelligence (IJCAI), 2020

Jingren Zhou

218

106

13 Jan 2020

Cross-Lingual Ability of Multilingual BERT: An Empirical StudyInternational Conference on Learning Representations (ICLR), 2019

293

364

17 Dec 2019

WaLDORf: Wasteless Language-model Distillation On Reading-comprehension

169

13 Dec 2019

TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un-)Supervised NLP

202

02 Dec 2019

Do Attention Heads in BERT Track Syntactic Dependencies?

238

144

27 Nov 2019

Graph Transformer for Graph-to-Sequence LearningAAAI Conference on Artificial Intelligence (AAAI), 2019

Deng Cai

W. Lam

309

246

18 Nov 2019

What do you mean, BERT? Assessing BERT as a Distributional Semantics Model

237

13 Nov 2019

Understanding Multi-Head Attention in Abstractive Summarization

Maarten de Rijke

130

10 Nov 2019

Blockwise Self-Attention for Long Document UnderstandingFindings (Findings), 2019

Hao Ma

Sinong Wang

309

269

07 Nov 2019

Efficiency through Auto-Sizing: Notre Dame NLP's Submission to the WNGT 2019 Efficiency TaskConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Kenton W. Murray

Brian DuSell

David Chiang

16 Oct 2019

Structured Pruning of a BERT-based Question Answering Model

J. Scott McCarley

Rishav Chakravarti

Avirup Sil

264

14 Oct 2019

exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models

Benjamin Hoover

Hendrik Strobelt

Sebastian Gehrmann

125

11 Oct 2019

Structured Pruning of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Ziheng Wang

Jeremy Wohlwend

Tao Lei

296

328

10 Oct 2019

Reducing Transformer Depth on Demand with Structured DropoutInternational Conference on Learning Representations (ICLR), 2019

Angela Fan

Edouard Grave

Armand Joulin

614

658

25 Sep 2019

TinyBERT: Distilling BERT for Natural Language UnderstandingFindings (Findings), 2019

Xiaoqi Jiao

Yichun Yin

Lifeng Shang

Xin Jiang

Xiao Chen

Linlin Li

F. Wang

Qun Liu

VLM

604

2,161

23 Sep 2019

SANVis: Visual Analytics for Understanding Self-Attention NetworksVisual .. (VISUAL), 2019

173

13 Sep 2019

Multi-Granularity Self-Attention for Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

170

05 Sep 2019

The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling ObjectivesConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Elena Voita

Rico Sennrich

Ivan Titov

471

202

03 Sep 2019

Rotate King to get Queen: Word Relationships as Orthogonal Transformations in Embedding SpaceConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Kawin Ethayarajh

LLMSV

147

02 Sep 2019

Improving Multi-Head Attention with Capsule NetworksNatural Language Processing and Chinese Computing (NLPCC), 2019

Shuhao Gu

Yang Feng

215

31 Aug 2019

Adaptively Sparse TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Gonçalo M. Correia

Vlad Niculae

André F. T. Martins

341

277

30 Aug 2019

Encoders Help You Disambiguate Word Senses in Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Gongbo Tang

Rico Sennrich

Joakim Nivre

186

30 Aug 2019

Revealing the Dark Secrets of BERTConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

390

604

21 Aug 2019

On Identifiability in TransformersInternational Conference on Learning Representations (ICLR), 2019

Gino Brunner

Yang Liu

Damian Pascual

Oliver Richter

Massimiliano Ciaramita

Roger Wattenhofer

ViT

327

202

12 Aug 2019

VisualBERT: A Simple and Performant Baseline for Vision and Language

596

2,211

09 Aug 2019

Is artificial data useful for biomedical Natural Language Processing algorithms?

138

01 Jul 2019

Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?

Maarten de Rijke

163

01 Jul 2019

Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource SettingsInterspeech (Interspeech), 2019

Marcely Zanon Boito

Aline Villavicencio

Laurent Besacier

163

29 Jun 2019

Theoretical Limitations of Self-Attention in Neural Sequence ModelsTransactions of the Association for Computational Linguistics (TACL), 2019

Michael Hahn

352

337

16 Jun 2019

A Multiscale Visualization of Attention in the Transformer ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

Jesse Vig

ViT

203

658

12 Jun 2019

What Does BERT Look At? An Analysis of BERT's Attention

Kevin Clark

Urvashi Khandelwal

Omer Levy

Christopher D. Manning

MILM

614

1,839

11 Jun 2019

Analyzing the Structure of Attention in a Transformer Language Model

Jesse Vig

Yonatan Belinkov

271

427

07 Jun 2019

Are Sixteen Heads Really Better than One?Neural Information Processing Systems (NeurIPS), 2019

Paul Michel

Omer Levy

Graham Neubig

MoE

415

1,234

25 May 2019

An Attentive Survey of Attention Models

414

723

05 Apr 2019

Attention in Natural Language Processing

443

555

04 Feb 2019