v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 742 papers shown

Numerical Optimizations for Weighted Low-rank Estimation on Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

228

02 Nov 2022

Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks

Rochelle Choenni

Dan Garrette

Ekaterina Shutova

301

31 Oct 2022

Modeling structure-building in the brain with CCG parsing and large language modelsCognitive Sciences (CS), 2022

195

28 Oct 2022

Towards Improving Workers' Safety and Progress Monitoring of Construction Sites Through Construction Site Understanding

Mahdi Bonyani

Maryam Soleymani

136

27 Oct 2022

Benchmarking Language Models for Code Syntax UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

166

26 Oct 2022

Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models

Stelios Maroudas

Sotiris Legkas

Prodromos Malakasiotis

Ilias Chalkidis

VLM AILaw ALM ELM

265

24 Oct 2022

Is Encoder-Decoder Redundant for Neural Machine Translation?

204

21 Oct 2022

Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Qing Qu

235

18 Oct 2022

Token Merging: Your ViT But FasterInternational Conference on Learning Representations (ICLR), 2022

Christoph Feichtenhofer

Judy Hoffman

MoMe

419

735

17 Oct 2022

Shapley Head Pruning: Identifying and Removing Interference in Multilingual TransformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

William B. Held

Diyi Yang

VLM

238

11 Oct 2022

Mixture of Attention Heads: Selecting Attention Heads Per TokenConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Jie Zhou

613

11 Oct 2022

Metaphorical Paraphrase Generation: Feeding Metaphorical Language Models with Literal Texts

Giorgio Ottolina

John Pavlopoulos

171

10 Oct 2022

Parameter-Efficient Tuning with Special Token AdaptationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

241

10 Oct 2022

Better Pre-Training by Reducing Representation ConfusionFindings (Findings), 2022

Mingfei Liang

126

09 Oct 2022

Breaking BERT: Evaluating and Optimizing Sparsified Attention

Siddhartha Brahma

Polina Zablotskaia

David M. Mimno

163

07 Oct 2022

Masked Spiking TransformerIEEE International Conference on Computer Vision (ICCV), 2022

Ziqing Wang

Yuetong Fang

Jiahang Cao

Qiang Zhang

Zhongrui Wang

Renjing Xu

206

03 Oct 2022

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Yuxuan Li

James L. McClelland

472

02 Oct 2022

Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks

Yong Liu

...

202

25 Sep 2022

In-context Learning and Induction Heads

...

609

722

24 Sep 2022

Towards Faithful Model Explanation in NLP: A SurveyComputational Linguistics (CL), 2022

Qing Lyu

Marianna Apidianaki

Chris Callison-Burch

XAI

537

172

22 Sep 2022

Relaxed Attention for Transformer ModelsIEEE International Joint Conference on Neural Network (IJCNN), 2022

183

20 Sep 2022

Hydra Attention: Efficient Attention with Many Heads

313

100

15 Sep 2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech RecognitionInterspeech (Interspeech), 2022

Kartik Audhkhasi

Yinghui Huang

Bhuvana Ramabhadran

Pedro J. Moreno

136

13 Sep 2022

Analyzing Transformers in Embedding SpaceAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

336

124

06 Sep 2022

Efficient Methods for Natural Language Processing: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2022

Marcos Vinícius Treviso

...

Niranjan Balasubramanian

Leon Derczynski

Iryna Gurevych

Roy Schwartz

410

142

31 Aug 2022

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad RelevanceInternational Conference on Information and Knowledge Management (CIKM), 2022

Yujing Wang

179

30 Aug 2022

Survey: Exploiting Data Redundancy for Optimization of Deep LearningACM Computing Surveys (ACM CSUR), 2022

170

29 Aug 2022

Combining Compressions for Multiplicative Size Scaling on Natural Language TasksInternational Conference on Computational Linguistics (COLING), 2022

170

20 Aug 2022

Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Nuno M. Guerreiro

Elena Voita

André F. T. Martins

HILM

300

10 Aug 2022

Attention Hijacking in Trojan Transformers

193

09 Aug 2022

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Tilman Raukur

A. Ho

Stephen Casper

Dylan Hadfield-Menell

AAML AI4CE

785

170

27 Jul 2022

Revealing Secrets From Pre-trained Models

Mujahid Al Rafi

Yuan Feng

Hyeran Jeon

172

19 Jul 2022

eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic SegmentationPattern Recognition (Pattern Recogn.), 2022

223

12 Jul 2022

STI: Turbocharge NLP Inference at the Edge via Elastic PipeliningInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022

Liwei Guo

Wonkyo Choe

F. Lin

199

11 Jul 2022