v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 741 papers shown

Repulsive Attention: Rethinking Multi-head Attention as Bayesian InferenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Bang An

268

20 Sep 2020

Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine TranslationBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2020

Rajiv Movva

Jason Zhao

219

17 Sep 2020

Efficient Transformers: A SurveyACM Computing Surveys (ACM CSUR), 2020

869

1,362

14 Sep 2020

Time-based Sequence Model for Personalization and Recommendation Systems

140

27 Aug 2020

Making Neural Networks Interpretable with Attribution: Application to Implicit Signals PredictionACM Conference on Recommender Systems (RecSys), 2020

Darius Afchar

Romain Hennequin

FAtt XAI

228

26 Aug 2020

TSAM: Temporal Link Prediction in Directed Networks based on Self-Attention Mechanism

131

23 Aug 2020

On the Importance of Local Information in Transformer Based Models

Mitesh M. Khapra

13 Aug 2020

Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data (TKDD), 2020

Manish Gupta

Puneet Agrawal

VLM MedIm AI4CE

512

134

12 Aug 2020

DeLighT: Deep and Light-weight Transformer

Luke Zettlemoyer

249

03 Aug 2020

Spatially Aware Multimodal Transformers for TextVQAEuropean Conference on Computer Vision (ECCV), 2020

Devi Parikh

205

23 Jul 2020

Data Movement Is All You Need: A Case Study on Optimizing Transformers

418

168

30 Jun 2020

Multi-Head Attention: Collaborate Instead of Concatenate

Jean-Baptiste Cordonnier

Andreas Loukas

Martin Jaggi

222

152

29 Jun 2020

BERTology Meets Biology: Interpreting Attention in Protein Language Models

409

335

26 Jun 2020

A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

Grégoire Mialon

Dexiong Chen

Alexandre d’Aspremont

Julien Mairal

232

22 Jun 2020

On the Computational Power of Transformers and its Implications in Sequence Modeling

S. Bhattamishra

Arkil Patel

Navin Goyal

453

16 Jun 2020

Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?

282

09 Jun 2020

BERT Loses Patience: Fast and Robust Inference with Early Exit

381

401

07 Jun 2020

Distilling Neural Networks for Greener and Faster Dependency ParsingInternational Workshop/Conference on Parsing Technologies (IWPT), 2020

Mark Anderson

Carlos Gómez-Rodríguez

143

01 Jun 2020

CNRL at SemEval-2020 Task 5: Modelling Causal Reasoning in Language with Multi-Head Self-Attention Weights based Counterfactual DetectionInternational Workshop on Semantic Evaluation (SemEval), 2020

Rajaswa Patil

V. Baths

31 May 2020

HAT: Hardware-Aware Transformers for Efficient Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Zhijian Liu

Chuang Gan

Song Han

264

281

28 May 2020

Unsupervised Quality Estimation for Neural Machine Translation

Francisco Guzmán

302

252

21 May 2020

Enhancing Monotonic Multihead Attention for Streaming ASR

Hirofumi Inaguma

Masato Mimura

Tatsuya Kawahara

293

19 May 2020

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

271

139

15 May 2020

A Mixture of

h-1

Heads is Better than

h

Hao Peng

175

13 May 2020

The Unstoppable Rise of Computational Linguistics in Deep Learning

James Henderson

AI4CE

240

13 May 2020

Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer ArchitectureAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Christopher Brix

Parnia Bahar

Hermann Ney

229

04 May 2020

Similarity Analysis of Contextual Word Representation ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

273

03 May 2020

Hard-Coded Gaussian Attention for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Weiqiu You

Simeng Sun

Mohit Iyyer

226

02 May 2020

When BERT Plays the Lottery, All Tickets Are WinningConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

309

200

01 May 2020

Does Data Augmentation Improve Generalization in NLP?

Rohan Jha

Charles Lovering

Ellie Pavlick

210

30 Apr 2020

How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable MaskingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Nicola De Cao

Michael Schlichtkrull

Wilker Aziz

Ivan Titov

206

30 Apr 2020

Character-Level Translation with Self-attentionAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Yingqiang Gao

Nikola I. Nikolov

Yuhuang Hu

Richard H. R. Hahnloser

153

30 Apr 2020

Universal Dependencies according to BERT: both more specific and more generalFindings (Findings), 2020

Tomasz Limisiewicz

Rudolf Rosa

David Marevcek

150

30 Apr 2020

What Happens To BERT Embeddings During Fine-tuning?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2020

247

206

29 Apr 2020

Scheduled DropHead: A Regularization Method for Transformer ModelsFindings (Findings), 2020

186

28 Apr 2020

Assessing the Bilingual Knowledge Learned by Neural Machine Translation Models

159

28 Apr 2020

DeeBERT: Dynamic Early Exiting for Accelerating BERT InferenceAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

232

441

27 Apr 2020

On Sparsifying Encoder Outputs in Sequence-to-Sequence ModelsFindings (Findings), 2020

Biao Zhang

Ivan Titov

Rico Sennrich

24 Apr 2020

The Right Tool for the Job: Matching Model and Instance ComplexitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Gabriel Stanovsky

334

178

16 Apr 2020

Relation Transformer Network

Rajat Koner

Poulami Sinhamahapatra

Volker Tresp

ViT

344

13 Apr 2020

Telling BERT's full story: from Local Attention to Global AggregationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020

Damian Pascual

Gino Brunner

Roger Wattenhofer

203

10 Apr 2020

DynaBERT: Dynamic BERT with Adaptive Width and DepthNeural Information Processing Systems (NeurIPS), 2020

Lu Hou

Zhiqi Huang

Lifeng Shang

Xin Jiang

Xiao Chen

Qun Liu

277

361

08 Apr 2020

On the Effect of Dropping Layers of Pre-trained Transformer ModelsComputer Speech and Language (CSL), 2020

316

174

08 Apr 2020

Understanding Learning Dynamics for Neural Machine Translation

110

05 Apr 2020

Information-Theoretic Probing with Minimum Description LengthConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Elena Voita

Ivan Titov

266

296

27 Mar 2020

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers

128

21 Mar 2020

Pre-trained Models for Natural Language Processing: A SurveyScience China Technological Sciences (Sci China Technol Sci), 2020

Xipeng Qiu

Tianxiang Sun

Xuanjing Huang

1.1K

1,625

18 Mar 2020

A Primer in BERTology: What we know about how BERT worksTransactions of the Association for Computational Linguistics (TACL), 2020

479

1,731

27 Feb 2020

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

301

153

26 Feb 2020

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine TranslationFindings (Findings), 2020

Alessandro Raganato

Yves Scherrer

Jörg Tiedemann

371

24 Feb 2020