v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 743 papers shown

Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures

Archit Parnami

Rahul Singh

Tarun Joshi

238

28 Oct 2021

Interpreting Deep Learning Models in Natural Language Processing: A Review

Diyi Yang

Jiwei Li

246

20 Oct 2021

Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal

Sharath Chandra Raparthy

Irina Rish

Yoshua Bengio

Guillaume Lajoie

219

18 Oct 2021

Improving Transformers with Probabilistic Attention Keys

Richard G. Baraniuk

Stanley J. Osher

219

16 Oct 2021

Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding

Ahmed Hassan Awadallah

230

16 Oct 2021

Breaking Down Multilingual Machine Translation

Ting-Rui Chiang

Yi-Pei Chen

Yi-Ting Yeh

Graham Neubig

196

15 Oct 2021

On the Pitfalls of Analyzing Individual Neurons in Language Models

Omer Antverg

Yonatan Belinkov

MILM

250

14 Oct 2021

Leveraging redundancy in attention with Reuse Transformers

Srinadh Bhojanapalli

Sanjiv Kumar

189

13 Oct 2021

Global Vision Transformer Pruning with Hessian-Aware SaliencyComputer Vision and Pattern Recognition (CVPR), 2021

Huanrui Yang

251

10 Oct 2021

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

138

07 Oct 2021

On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation

Gal Patel

Leshem Choshen

Omri Abend

299

06 Oct 2021

How BPE Affects Memorization in Transformers

Eugene Kharitonov

Marco Baroni

Dieuwke Hupkes

515

06 Oct 2021

MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

Zhengyan Zhang

Yankai Lin

Zhiyuan Liu

Peng Li

Maosong Sun

Jie Zhou

MoE

500

168

05 Oct 2021

On the Prunability of Attention Heads in Multilingual BERT

Aakriti Budhraja

Madhura Pande

Pratyush Kumar

Mitesh M. Khapra

208

26 Sep 2021

Predicting Attention Sparsity in Transformers

Marcos Vinícius Treviso

António Góis

Patrick Fernandes

E. Fonseca

André F. T. Martins

371

24 Sep 2021

Grounding Natural Language Instructions: Can Large Language Models Capture Spatial Information?

175

17 Sep 2021

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

422

15 Sep 2021

The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders

Han He

Jinho Choi

306

130

14 Sep 2021

The Grammar-Learning Trajectories of Neural Language Models

322

13 Sep 2021

Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

Javier Ferrando

Marta R. Costa-jussá

129

13 Sep 2021

GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based on Transformer Networks

211

13 Sep 2021

Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model

Shaolei Zhang

Yang Feng

265

11 Sep 2021

Document-level Entity-based Extraction as Template GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Kung-Hsiang Huang

Sam Tang

Nanyun Peng

168

10 Sep 2021

Block Pruning For Faster TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

270

254

10 Sep 2021

Bag of Tricks for Optimizing Transformer EfficiencyConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Ye Lin

Yanyang Li

Tong Xiao

Jingbo Zhu

135

09 Sep 2021

Transformers in the loop: Polarity in neural models of languageAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Lisa Bylinina

Alexey Tikhonov

133

08 Sep 2021

Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Potsawee Manakul

Mark Gales

126

08 Sep 2021

Interactively Providing Explanations for Transformer Language Models

Felix Friedrich

P. Schramowski

Christopher Tauchmann

Kristian Kersting

LRM

450

02 Sep 2021

Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word SalienceConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

G. Chrysostomou

Nikolaos Aletras

264

31 Aug 2021

T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLPConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

172

31 Aug 2021

Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

152

30 Aug 2021

Layer-wise Model Pruning based on Mutual InformationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Jiwei Li

162

28 Aug 2021

Fine-Tuning Pretrained Language Models With Label Attention for Biomedical Text Classification

Bruce Nguyen

Shaoxiong Ji

MedIm

244

26 Aug 2021

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

761

18 Aug 2021

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Katikapalli Subramanyam Kalyan

A. Rajasekharan

S. Sangeetha

VLM LM&MA

321

317

12 Aug 2021

Adaptive Multi-Resolution Attention with Linear ComplexityIEEE International Joint Conference on Neural Network (IJCNN), 2021

Yao Zhang

Yunpu Ma

T. Seidl

Volker Tresp

118

10 Aug 2021

Differentiable Subset Pruning of Transformer HeadsTransactions of the Association for Computational Linguistics (TACL), 2021

Jiaoda Li

Robert Bamler

Mrinmaya Sachan

366

10 Aug 2021

FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field AttentionNeural Information Processing Systems (NeurIPS), 2021

T. Nguyen

Vai Suliafu

Stanley J. Osher

Long Chen

Bao Wang

157

05 Aug 2021

A Dynamic Head Importance Computation Mechanism for Neural Machine Translation

Akshay Goindani

Manish Shrivastava

143

03 Aug 2021

Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability

Furong Huang

126

03 Aug 2021

Mixed SIGNals: Sign Language Production via a Mixture of Motion PrimitivesIEEE International Conference on Computer Vision (ICCV), 2021

271

23 Jul 2021

More Parameters? No Thanks!Findings (Findings), 2021

128

20 Jul 2021

Learned Token Pruning for Transformers

Sehoon Kim

382

198

02 Jul 2021

A Primer on Pretrained Multilingual Language Models

Mitesh M. Khapra

233

01 Jul 2021

AutoFormer: Searching Transformers for Visual Recognition

329

01 Jul 2021

The MultiBERTs: BERT Reproductions for Robustness AnalysisInternational Conference on Learning Representations (ICLR), 2021

...

391

102

30 Jun 2021

It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense ReasoningFindings (Findings), 2021

Alexey Tikhonov

Max Ryabinin

LRM

281

22 Jun 2021

Attend What You Need: Motion-Appearance Synergistic Networks for Video Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

192

19 Jun 2021

Soft Attention: Does it Actually Help to Learn Social Interactions in Pedestrian Trajectory Prediction?

123

16 Jun 2021

What Context Features Can Transformer Language Models Use?

J. O'Connor

Jacob Andreas

KELM

231

15 Jun 2021