ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.09418
  4. Cited By
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
ArXiv (abs)PDFHTML

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 742 papers shown
Multiformer: A Head-Configurable Transformer-Based Model for Direct
  Speech Translation
Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Gerard Sant
Gerard I. Gállego
Belen Alastruey
Marta R. Costa-jussá
172
4
0
14 May 2022
A Study of the Attention Abnormality in Trojaned BERTs
A Study of the Attention Abnormality in Trojaned BERTsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Weimin Lyu
Songzhu Zheng
Teng Ma
Chao Chen
295
66
0
13 May 2022
EigenNoise: A Contrastive Prior to Warm-Start Representations
EigenNoise: A Contrastive Prior to Warm-Start Representations
H. Heidenreich
Jake Williams
131
1
0
09 May 2022
Knowledge Distillation of Russian Language Models with Reduction of
  Vocabulary
Knowledge Distillation of Russian Language Models with Reduction of VocabularyComputational Linguistics and Intellectual Technologies (CLIT), 2022
A. Kolesnikova
Yuri Kuratov
Vasily Konovalov
Andrey Kravchenko
VLM
124
12
0
04 May 2022
Adaptable Adapters
Adaptable AdaptersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
203
20
0
03 May 2022
Visualizing and Explaining Language Models
Visualizing and Explaining Language Models
Adrian M. P. Braşoveanu
Razvan Andonie
MILMVLM
326
7
0
30 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
280
297
0
27 Apr 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and
  Applications
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
259
133
0
25 Apr 2022
Merging of neural networks
Merging of neural networksNeural Processing Letters (NPL), 2022
Martin Pasen
Vladimír Boza
FedMLMoMe
186
3
0
21 Apr 2022
Regularization-based Pruning of Irrelevant Weights in Deep Neural
  Architectures
Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures
Giovanni Bonetta
Matteo Ribero
R. Cancelliere
187
9
0
11 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language
  Models via Attention Guiding
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Zhaochun Ren
Huasheng Liang
Qiang Yan
Sudipta Singha Roy
126
10
0
06 Apr 2022
Probing Structured Pruning on Multilingual Pre-trained Models: Settings,
  Algorithms, and Efficiency
Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and EfficiencyAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yanyang Li
Fuli Luo
Runxin Xu
Songfang Huang
Fei Huang
Liwei Wang
157
3
0
06 Apr 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine
  Translation
CipherDAug: Ciphertext based Data Augmentation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Nishant Kambhatla
Logan Born
Anoop Sarkar
190
18
0
01 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Structured Pruning Learns Compact and Accurate ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
387
214
0
01 Apr 2022
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
TextPruner: A Model Pruning Toolkit for Pre-Trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Ziqing Yang
Yiming Cui
Zhigang Chen
SyDaVLM
159
14
0
30 Mar 2022
Fine-Grained Visual Entailment
Fine-Grained Visual EntailmentEuropean Conference on Computer Vision (ECCV), 2022
Christopher Thomas
Yipeng Zhang
Shih-Fu Chang
333
7
0
29 Mar 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for TransformersNeural Information Processing Systems (NeurIPS), 2022
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
243
206
0
29 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token SelectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
183
20
0
27 Mar 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented
  Languages and Dialects in Indonesia
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in IndonesiaAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
226
133
0
24 Mar 2022
Input-specific Attention Subnetworks for Adversarial Detection
Input-specific Attention Subnetworks for Adversarial DetectionFindings (Findings), 2022
Emil Biju
Anirudh Sriram
Pratyush Kumar
Mitesh M Khapra
AAML
158
5
0
23 Mar 2022
Training-free Transformer Architecture Search
Training-free Transformer Architecture SearchComputer Vision and Pattern Recognition (CVPR), 2022
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
183
56
0
23 Mar 2022
Task-guided Disentangled Tuning for Pretrained Language Models
Task-guided Disentangled Tuning for Pretrained Language ModelsFindings (Findings), 2022
Jiali Zeng
Yu Jiang
Shuangzhi Wu
Yongjing Yin
Mu Li
DRL
298
3
0
22 Mar 2022
Word Order Does Matter (And Shuffled Language Models Know It)
Word Order Does Matter (And Shuffled Language Models Know It)Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Vinit Ravishankar
Mostafa Abdou
Artur Kulmizev
Anders Søgaard
196
50
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
288
7
0
20 Mar 2022
Gaussian Multi-head Attention for Simultaneous Machine Translation
Gaussian Multi-head Attention for Simultaneous Machine TranslationFindings (Findings), 2022
Shaolei Zhang
Yang Feng
150
26
0
17 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for
  Large Language Models
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLMMQMedIm
395
146
0
14 Mar 2022
A Novel Perspective to Look At Attention: Bi-level Attention-based
  Explainable Topic Modeling for News Classification
A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News ClassificationFindings (Findings), 2022
Dairui Liu
Derek Greene
Ruihai Dong
242
14
0
14 Mar 2022
Visualizing and Understanding Patch Interactions in Vision Transformer
Visualizing and Understanding Patch Interactions in Vision TransformerIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Jie Ma
Yalong Bai
Bineng Zhong
Wei Zhang
Ting Yao
Tao Mei
ViT
147
52
0
11 Mar 2022
Data-Efficient Structured Pruning via Submodular Optimization
Data-Efficient Structured Pruning via Submodular OptimizationNeural Information Processing Systems (NeurIPS), 2022
Marwa El Halabi
Suraj Srinivas
Damien Scieur
410
22
0
09 Mar 2022
Understanding microbiome dynamics via interpretable graph representation
  learning
Understanding microbiome dynamics via interpretable graph representation learningScientific Reports (Sci Rep), 2022
K. Melnyk
Kuba Weimann
Tim Conrad
212
7
0
02 Mar 2022
XAI for Transformers: Better Explanations through Conservative
  Propagation
XAI for Transformers: Better Explanations through Conservative PropagationInternational Conference on Machine Learning (ICML), 2022
Ameen Ali
Thomas Schnake
Oliver Eberle
G. Montavon
Klaus-Robert Muller
Lior Wolf
FAtt
332
128
0
15 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language
  Models
A Survey on Model Compression and Acceleration for Pretrained Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022
Canwen Xu
Julian McAuley
359
87
0
15 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image CaptioningNeurocomputing (Neurocomputing), 2022
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLMViT
206
23
0
11 Feb 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for
  Training Large Transformer Models
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer ModelsInternational Conference on Learning Representations (ICLR), 2022
Chen Liang
Haoming Jiang
Simiao Zuo
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
T. Zhao
182
17
0
06 Feb 2022
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for
  Distilling Large Language Models
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu
Subhabrata Mukherjee
Xiaodong Liu
Debadeepta Dey
Wenhui Wang
Xiang Zhang
Ahmed Hassan Awadallah
Jianfeng Gao
202
5
0
29 Jan 2022
Rethinking Attention-Model Explainability through Faithfulness Violation
  Test
Rethinking Attention-Model Explainability through Faithfulness Violation TestInternational Conference on Machine Learning (ICML), 2022
Zichen Liu
Haoliang Li
Yangyang Guo
Chen Kong
Jing Li
Shiqi Wang
FAtt
328
57
0
28 Jan 2022
Can Model Compression Improve NLP Fairness
Can Model Compression Improve NLP Fairness
Guangxuan Xu
Qingyuan Hu
146
30
0
21 Jan 2022
Latency Adjustable Transformer Encoder for Language Understanding
Latency Adjustable Transformer Encoder for Language UnderstandingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Sajjad Kachuee
M. Sharifkhani
590
1
0
10 Jan 2022
Intelligent Online Selling Point Extraction for E-Commerce
  Recommendation
Intelligent Online Selling Point Extraction for E-Commerce Recommendation
Xiaojie Guo
Shugen Wang
Hanqing Zhao
Shiliang Diao
Jiajia Chen
...
Zhen He
Yun Xiao
Bo Long
Han Yu
Lingfei Wu
144
18
0
16 Dec 2021
Sparse Interventions in Language Models with Differentiable Masking
Sparse Interventions in Language Models with Differentiable Masking
Nicola De Cao
Leon Schmid
Dieuwke Hupkes
Ivan Titov
185
32
0
13 Dec 2021
On the Compression of Natural Language Models
On the Compression of Natural Language Models
S. Damadi
92
0
0
13 Dec 2021
Human Guided Exploitation of Interpretable Attention Patterns in
  Summarization and Topic Segmentation
Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation
Raymond Li
Wen Xiao
Linzi Xing
Lanjun Wang
Gabriel Murray
Giuseppe Carenini
ViT
266
9
0
10 Dec 2021
Explainable Deep Learning in Healthcare: A Methodological Survey from an
  Attribution View
Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution ViewWIREs Mechanisms of Disease (WIREs Mech Dis), 2021
Di Jin
Elena Sergeeva
W. Weng
Geeticka Chauhan
Peter Szolovits
OOD
287
76
0
05 Dec 2021
Can depth-adaptive BERT perform better on binary classification tasks
Can depth-adaptive BERT perform better on binary classification tasks
Jing Fan
Xin Zhang
Sheng Zhang
Yan Pan
Lixiang Guo
MQ
183
0
0
22 Nov 2021
Does BERT look at sentiment lexicon?
Does BERT look at sentiment lexicon?International Joint Conference on the Analysis of Images, Social Networks and Texts (AISNT), 2021
E. Razova
S. Vychegzhanin
Evgeny Kotelnikov
181
3
0
19 Nov 2021
Local Multi-Head Channel Self-Attention for Facial Expression
  Recognition
Local Multi-Head Channel Self-Attention for Facial Expression Recognition
Roberto Pecoraro
Valerio Basile
Viviana Bono
Sara Gallo
ViT
316
62
0
14 Nov 2021
A Survey on Green Deep Learning
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
457
94
0
08 Nov 2021
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
  Framework
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
280
50
0
07 Nov 2021
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström
  Method
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström MethodNeural Information Processing Systems (NeurIPS), 2021
Yifan Chen
Qi Zeng
Heng Ji
Yun Yang
227
63
0
29 Oct 2021
Pruning Attention Heads of Transformer Models Using A* Search: A Novel
  Approach to Compress Big NLP Architectures
Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures
Archit Parnami
Rahul Singh
Tarun Joshi
232
7
0
28 Oct 2021
Previous
123...91011...131415
Next
Page 10 of 15
Pageof 15