v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 741 papers shown

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech RecognitionNeural Information Processing Systems (NeurIPS), 2021

Kaizhi Qian

301

10 Jun 2021

Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

111

10 Jun 2021

Patch Slimming for Efficient Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2021

332

194

05 Jun 2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in TransformersFindings (Findings), 2021

Niranjan Balasubramanian

239

02 Jun 2021

Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?Findings (Findings), 2021

144

31 May 2021

Cascaded Head-colliding AttentionAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Lin Zheng

Zhiyong Wu

Lingpeng Kong

174

31 May 2021

Greedy-layer Pruning: Speeding up Transformer Models for Natural Language ProcessingPattern Recognition Letters (PR), 2021

188

31 May 2021

On Compositional Generalization of Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Yafu Li

Yongjing Yin

Yulong Chen

Yue Zhang

351

31 May 2021

On the Interplay Between Fine-tuning and Composition in TransformersFindings (Findings), 2021

Lang-Chi Yu

Allyson Ettinger

231

31 May 2021

Cross-Lingual Abstractive Summarization with Limited Parallel ResourcesAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Yu Bai

Yang Gao

Heyan Huang

209

28 May 2021

Inspecting the concept knowledge graph encoded by modern language modelsFindings (Findings), 2021

Carlos Aspillaga

Marcelo Mendoza

Alvaro Soto

222

27 May 2021

How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?Findings (Findings), 2021

194

27 May 2021

LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and BeyondArtificial Intelligence (AI), 2021

Daniel Loureiro

A. Jorge

Jose Camacho-Collados

251

26 May 2021

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving GeneralizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Xiaodong Liu

174

25 May 2021

A Non-Linear Structural ProbeNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

144

21 May 2021

Medical Image Segmentation Using Squeeze-and-Expansion TransformersInternational Joint Conference on Artificial Intelligence (IJCAI), 2021

Yong Liu

163

188

20 May 2021

Rationalization through ConceptsFindings (Findings), 2021

Diego Antognini

Boi Faltings

FAtt

214

11 May 2021

FNet: Mixing Tokens with Fourier TransformsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

James Lee-Thorp

Joshua Ainslie

Ilya Eckstein

Santiago Ontanon

643

641

09 May 2021

Long-Span Summarization via Local Attention and Content SelectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Potsawee Manakul

Mark Gales

231

08 May 2021

Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into SensesTransactions of the Association for Computational Linguistics (TACL), 2021

Aina Garí Soler

Marianna Apidianaki

MILM

415

29 Apr 2021

Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and AttentionWorkshop on Cognitive Modeling and Computational Linguistics (CMCL), 2021

S. Ryu

Richard L. Lewis

164

26 Apr 2021

Easy and Efficient Transformer : Scalable Inference Solution For large NLP modelNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Changjie Fan

Zeng Zhao

262

26 Apr 2021

Extract then Distill: Efficient and Effective Task-Agnostic BERT DistillationInternational Conference on Artificial Neural Networks (ICANN), 2021

Lifeng Shang

Xin Jiang

Qun Liu

139

24 Apr 2021

Code Structure Guided Transformer for Source Code SummarizationACM Transactions on Software Engineering and Methodology (TOSEM), 2021

Shuzheng Gao

Cuiyun Gao

Yulan He

Jichuan Zeng

L. Nie

Xin Xia

Michael R. Lyu

213

119

19 Apr 2021

BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with Assembly ModelsInternational Workshop on Semantic Evaluation (SemEval), 2021

A. Islam

Weicheng Ma

Soroush Vosoughi

125

19 Apr 2021

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Mozhdeh Gheini

Xiang Ren

Jonathan May

LRM

311

162

18 Apr 2021

Knowledge Neurons in Pretrained TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Damai Dai

Li Dong

Y. Hao

Zhifang Sui

Baobao Chang

Furu Wei

KELM MU

547

577

18 Apr 2021

Rethinking Network Pruning -- under the Pre-train and Fine-tune ParadigmNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Dongkuan Xu

193

18 Apr 2021

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence EncodersConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

324

132

16 Apr 2021

Effect of Post-processing on Contextualized Word RepresentationsInternational Conference on Computational Linguistics (COLING), 2021

Hassan Sajjad

Firoj Alam

Fahim Dalvi

Nadir Durrani

173

15 Apr 2021

Sparse Attention with Linear UnitsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Biao Zhang

Ivan Titov

Rico Sennrich

263

14 Apr 2021

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A SurveyJournal of Artificial Intelligence Research (JAIR), 2021

Danielle Saunders

AI4CE

362

107

14 Apr 2021

DirectProbe: Studying Representations without ClassifiersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Yichu Zhou

Vivek Srikumar

219

13 Apr 2021

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra CostNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

208

11 Apr 2021

On Biasing Transformer Attention Towards MonotonicityNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Annette Rios Gonzales

Chantal Amrhein

Noëmi Aepli

Rico Sennrich

135

08 Apr 2021

How Transferable are Reasoning Patterns in VQA?Computer Vision and Pattern Recognition (CVPR), 2021

149

08 Apr 2021

Attention Head Masking for Inference Time Content Selection in Abstractive SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Shuyang Cao

Lu Wang

CVBM

129

06 Apr 2021

Efficient Attentions for Long Document SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

L. Huang

Shuyang Cao

Nikolaus Nova Parulian

Heng Ji

Lu Wang

330

361

05 Apr 2021

VisQA: X-raying Vision and Language Reasoning in TransformersIEEE Transactions on Visualization and Computer Graphics (TVCG), 2021

301

02 Apr 2021

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder TransformersIEEE International Conference on Computer Vision (ICCV), 2021

358

412

29 Mar 2021

Learning on heterogeneous graphs using high-order relationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

See Hian Lee

Feng Ji

Wee Peng Tay

116

29 Mar 2021

Dodrio: Exploring Transformer Models with Interactive VisualizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Zijie J. Wang

Robert Turko

Duen Horng Chau

202

26 Mar 2021

Understanding Robustness of Transformers for Image ClassificationIEEE International Conference on Computer Vision (ICCV), 2021

Srinadh Bhojanapalli

313

472

26 Mar 2021

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

192

25 Mar 2021

Structured Co-reference Graph Attention for Video-grounded DialogueAAAI Conference on Artificial Intelligence (AAAI), 2021

202

24 Mar 2021

The NLP Cookbook: Modern Recipes for Transformer based Deep Learning ArchitecturesIEEE Access (IEEE Access), 2021

Sushant Singh

A. Mahmood

AI4TS

325

120

23 Mar 2021

Learning Calibrated-Guidance for Object Detection in Aerial ImagesIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (J-STARS), 2021

Mingqiang Wei

326

21 Mar 2021

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and BeyondKnowledge and Information Systems (KAIS), 2021

Haoyi Xiong

Jiang Bian

294

440

19 Mar 2021

Approximating How Single Head Attention Learns

Ruiqi Zhong

169

13 Mar 2021

An empirical analysis of phrase-based and neural machine translation

Hamidreza Ghader

115

04 Mar 2021