ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.09418
  4. Cited By
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
ArXiv (abs)PDFHTML

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 742 papers shown
Numerical Optimizations for Weighted Low-rank Estimation on Language
  Model
Numerical Optimizations for Weighted Low-rank Estimation on Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ting Hua
Yen-Chang Hsu
Felicity Wang
Qiang Lou
Yilin Shen
Hongxia Jin
228
20
0
02 Nov 2022
Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks
Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks
Rochelle Choenni
Dan Garrette
Ekaterina Shutova
301
2
0
31 Oct 2022
Modeling structure-building in the brain with CCG parsing and large
  language models
Modeling structure-building in the brain with CCG parsing and large language modelsCognitive Sciences (CS), 2022
Miloš Stanojević
Jonathan Brennan
Donald Dunagan
Mark Steedman
John T. Hale
195
19
0
28 Oct 2022
Towards Improving Workers' Safety and Progress Monitoring of
  Construction Sites Through Construction Site Understanding
Towards Improving Workers' Safety and Progress Monitoring of Construction Sites Through Construction Site Understanding
Mahdi Bonyani
Maryam Soleymani
136
0
0
27 Oct 2022
Benchmarking Language Models for Code Syntax Understanding
Benchmarking Language Models for Code Syntax UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Da Shen
Xinyun Chen
Chenguang Wang
Koushik Sen
Dawn Song
ELM
166
24
0
26 Oct 2022
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy
  light-weight models in the era of humongous Language Models
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models
Stelios Maroudas
Sotiris Legkas
Prodromos Malakasiotis
Ilias Chalkidis
VLMAILawALMELM
265
5
0
24 Oct 2022
Is Encoder-Decoder Redundant for Neural Machine Translation?
Is Encoder-Decoder Redundant for Neural Machine Translation?
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
204
6
0
21 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide
  Computation Reduction for Transfer Learning
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
235
16
0
18 Oct 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But FasterInternational Conference on Learning Representations (ICLR), 2022
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
419
735
0
17 Oct 2022
Shapley Head Pruning: Identifying and Removing Interference in
  Multilingual Transformers
Shapley Head Pruning: Identifying and Removing Interference in Multilingual TransformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
William B. Held
Diyi Yang
VLM
238
8
0
11 Oct 2022
Mixture of Attention Heads: Selecting Attention Heads Per Token
Mixture of Attention Heads: Selecting Attention Heads Per TokenConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xiaofeng Zhang
Songlin Yang
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
613
68
0
11 Oct 2022
Metaphorical Paraphrase Generation: Feeding Metaphorical Language Models
  with Literal Texts
Metaphorical Paraphrase Generation: Feeding Metaphorical Language Models with Literal Texts
Giorgio Ottolina
John Pavlopoulos
171
1
0
10 Oct 2022
Parameter-Efficient Tuning with Special Token Adaptation
Parameter-Efficient Tuning with Special Token AdaptationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Xiaoocong Yang
James Y. Huang
Wenxuan Zhou
Muhao Chen
241
14
0
10 Oct 2022
Better Pre-Training by Reducing Representation Confusion
Better Pre-Training by Reducing Representation ConfusionFindings (Findings), 2022
Haojie Zhang
Mingfei Liang
Ruobing Xie
Zhen Sun
Bo Zhang
Leyu Lin
126
2
0
09 Oct 2022
Breaking BERT: Evaluating and Optimizing Sparsified Attention
Breaking BERT: Evaluating and Optimizing Sparsified Attention
Siddhartha Brahma
Polina Zablotskaia
David M. Mimno
163
1
0
07 Oct 2022
Masked Spiking Transformer
Masked Spiking TransformerIEEE International Conference on Computer Vision (ICCV), 2022
Ziqing Wang
Yuetong Fang
Jiahang Cao
Qiang Zhang
Zhongrui Wang
Renjing Xu
206
56
0
03 Oct 2022
Systematic Generalization and Emergent Structures in Transformers
  Trained on Structured Tasks
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Yuxuan Li
James L. McClelland
472
22
0
02 Oct 2022
Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive
  Networks
Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks
Xiaofeng Lei
Shaohua Li
Xinxing Xu
Huazhu Fu
Yong Liu
...
Mingrui Tan
Yanyu Xu
Jocelyn Hui Lin Goh
Rick Siow Mong Goh
Ching-Yu Cheng
202
1
0
25 Sep 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
609
722
0
24 Sep 2022
Towards Faithful Model Explanation in NLP: A Survey
Towards Faithful Model Explanation in NLP: A SurveyComputational Linguistics (CL), 2022
Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
XAI
537
172
0
22 Sep 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer ModelsIEEE International Joint Conference on Neural Network (IJCNN), 2022
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
183
13
0
20 Sep 2022
Hydra Attention: Efficient Attention with Many Heads
Hydra Attention: Efficient Attention with Many Heads
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Judy Hoffman
313
100
0
15 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic
  Speech Recognition
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech RecognitionInterspeech (Interspeech), 2022
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
136
5
0
13 Sep 2022
Analyzing Transformers in Embedding Space
Analyzing Transformers in Embedding SpaceAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
336
124
0
06 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2022
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
410
142
0
31 Aug 2022
SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance
SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad RelevanceInternational Conference on Information and Knowledge Management (CIKM), 2022
Li Zhang
Youkow Homma
Yujing Wang
Ruibing Jin
Mao Yang
Ruofei Zhang
Ting Cao
Wei Shen
OffRL
179
6
0
30 Aug 2022
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
Survey: Exploiting Data Redundancy for Optimization of Deep LearningACM Computing Surveys (ACM CSUR), 2022
Jou-An Chen
Wei Niu
Bin Ren
Yanzhi Wang
Xipeng Shen
170
31
0
29 Aug 2022
Combining Compressions for Multiplicative Size Scaling on Natural
  Language Tasks
Combining Compressions for Multiplicative Size Scaling on Natural Language TasksInternational Conference on Computational Linguistics (COLING), 2022
Rajiv Movva
Jinhao Lei
Shayne Longpre
Ajay K. Gupta
Chris DuBois
VLMMQ
170
7
0
20 Aug 2022
Looking for a Needle in a Haystack: A Comprehensive Study of
  Hallucinations in Neural Machine Translation
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Nuno M. Guerreiro
Elena Voita
André F. T. Martins
HILM
300
68
0
10 Aug 2022
Attention Hijacking in Trojan Transformers
Attention Hijacking in Trojan Transformers
Weimin Lyu
Songzhu Zheng
Teng Ma
Haibin Ling
Chao Chen
193
9
0
09 Aug 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of
  Deep Neural Networks
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Tilman Raukur
A. Ho
Stephen Casper
Dylan Hadfield-Menell
AAMLAI4CE
785
170
0
27 Jul 2022
Revealing Secrets From Pre-trained Models
Revealing Secrets From Pre-trained Models
Mujahid Al Rafi
Yuan Feng
Hyeran Jeon
172
0
0
19 Jul 2022
eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised
  Semantic Segmentation
eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic SegmentationPattern Recognition (Pattern Recogn.), 2022
Lu Yu
Wei Xiang
Juan Fang
Yi-Ping Phoebe Chen
Lianhua Chi
ViT
223
33
0
12 Jul 2022
STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining
STI: Turbocharge NLP Inference at the Edge via Elastic PipeliningInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022
Liwei Guo
Wonkyo Choe
F. Lin
199
22
0
11 Jul 2022
Gender Biases and Where to Find Them: Exploring Gender Bias in
  Pre-Trained Transformer-based Language Models Using Movement Pruning
Gender Biases and Where to Find Them: Exploring Gender Bias in Pre-Trained Transformer-based Language Models Using Movement Pruning
Przemyslaw K. Joniak
Akiko Aizawa
134
29
0
06 Jul 2022
Probing via Prompting
Probing via PromptingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Jiaoda Li
Robert Bamler
Mrinmaya Sachan
261
14
0
04 Jul 2022
The Topological BERT: Transforming Attention into Topology for Natural
  Language Processing
The Topological BERT: Transforming Attention into Topology for Natural Language Processing
Ilan Perez
Raphael Reinauer
176
21
0
30 Jun 2022
Discovering Salient Neurons in Deep NLP Models
Discovering Salient Neurons in Deep NLP ModelsJournal of machine learning research (JMLR), 2022
Nadir Durrani
Fahim Dalvi
Hassan Sajjad
KELMMILM
307
20
0
27 Jun 2022
Visualizing and Understanding Contrastive Learning
Visualizing and Understanding Contrastive LearningIEEE Transactions on Image Processing (IEEE TIP), 2022
Fawaz Sammani
Boris Joukovsky
Nikos Deligiannis
SSLFAtt
210
17
0
20 Jun 2022
Location-based Twitter Filtering for the Creation of Low-Resource
  Language Datasets in Indonesian Local Languages
Location-based Twitter Filtering for the Creation of Low-Resource Language Datasets in Indonesian Local Languages
Mukhlis Amien
Chong Feng
Heyan Huang
170
3
0
15 Jun 2022
Unveiling Transformers with LEGO: a synthetic reasoning task
Unveiling Transformers with LEGO: a synthetic reasoning task
Yi Zhang
A. Backurs
Sébastien Bubeck
Ronen Eldan
Suriya Gunasekar
Tal Wagner
LRM
426
101
0
09 Jun 2022
Optimizing Relevance Maps of Vision Transformers Improves Robustness
Optimizing Relevance Maps of Vision Transformers Improves RobustnessNeural Information Processing Systems (NeurIPS), 2022
Hila Chefer
Idan Schwartz
Lior Wolf
ViT
298
47
0
02 Jun 2022
Transforming medical imaging with Transformers? A comparative review of
  key properties, current progresses, and future perspectives
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViTOODMedIm
434
150
0
02 Jun 2022
Transformer with Fourier Integral Attentions
Transformer with Fourier Integral Attentions
T. Nguyen
Minh Pham
Tam Nguyen
Khai Nguyen
Stanley J. Osher
Nhat Ho
170
6
0
01 Jun 2022
Lack of Fluency is Hurting Your Translation Model
Lack of Fluency is Hurting Your Translation Model
J. Yoo
Jaewoo Kang
162
0
0
24 May 2022
Life after BERT: What do Other Muppets Understand about Language?
Life after BERT: What do Other Muppets Understand about Language?Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Vladislav Lialin
Kevin Zhao
Namrata Shivagunde
Anna Rumshisky
372
7
0
21 May 2022
Revisiting Pre-trained Language Models and their Evaluation for Arabic
  Natural Language Understanding
Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
Abbas Ghaddar
Yimeng Wu
Sunyam Bagga
Ahmad Rashid
Khalil Bibi
...
Zhefeng Wang
Baoxing Huai
Xin Jiang
Qun Liu
Philippe Langlais
182
9
0
21 May 2022
Exploring Extreme Parameter Compression for Pre-trained Language Models
Exploring Extreme Parameter Compression for Pre-trained Language ModelsInternational Conference on Learning Representations (ICLR), 2022
Yuxin Ren
Benyou Wang
Lifeng Shang
Xin Jiang
Qun Liu
197
22
0
20 May 2022
Foundation Posteriors for Approximate Probabilistic Inference
Foundation Posteriors for Approximate Probabilistic InferenceNeural Information Processing Systems (NeurIPS), 2022
Mike Wu
Noah D. Goodman
UQCV
228
7
0
19 May 2022
Acceptability Judgements via Examining the Topology of Attention Maps
Acceptability Judgements via Examining the Topology of Attention MapsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
D. Cherniavskii
Eduard Tulchinskii
Vladislav Mikhailov
Irina Proskurina
Laida Kushnareva
Ekaterina Artemova
S. Barannikov
Irina Piontkovskaya
D. Piontkovski
Evgeny Burnaev
967
25
0
19 May 2022
Previous
123...8910...131415
Next
Page 9 of 15
Pageof 15