ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.09418
  4. Cited By
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
ArXiv (abs)PDFHTML

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 743 papers shown
Acceptability Judgements via Examining the Topology of Attention Maps
Acceptability Judgements via Examining the Topology of Attention MapsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
D. Cherniavskii
Eduard Tulchinskii
Vladislav Mikhailov
Irina Proskurina
Laida Kushnareva
Ekaterina Artemova
S. Barannikov
Irina Piontkovskaya
D. Piontkovski
Evgeny Burnaev
972
25
0
19 May 2022
Multiformer: A Head-Configurable Transformer-Based Model for Direct
  Speech Translation
Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Gerard Sant
Gerard I. Gállego
Belen Alastruey
Marta R. Costa-jussá
202
4
0
14 May 2022
A Study of the Attention Abnormality in Trojaned BERTs
A Study of the Attention Abnormality in Trojaned BERTsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Weimin Lyu
Songzhu Zheng
Teng Ma
Chao Chen
374
67
0
13 May 2022
EigenNoise: A Contrastive Prior to Warm-Start Representations
EigenNoise: A Contrastive Prior to Warm-Start Representations
H. Heidenreich
Jake Williams
135
1
0
09 May 2022
Knowledge Distillation of Russian Language Models with Reduction of
  Vocabulary
Knowledge Distillation of Russian Language Models with Reduction of VocabularyComputational Linguistics and Intellectual Technologies (CLIT), 2022
A. Kolesnikova
Yuri Kuratov
Vasily Konovalov
Andrey Kravchenko
VLM
127
12
0
04 May 2022
Adaptable Adapters
Adaptable AdaptersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
235
20
0
03 May 2022
Visualizing and Explaining Language Models
Visualizing and Explaining Language Models
Adrian M. P. Braşoveanu
Razvan Andonie
MILMVLM
336
7
0
30 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
285
308
0
27 Apr 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and
  Applications
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
264
133
0
25 Apr 2022
Merging of neural networks
Merging of neural networksNeural Processing Letters (NPL), 2022
Martin Pasen
Vladimír Boza
FedMLMoMe
214
3
0
21 Apr 2022
Regularization-based Pruning of Irrelevant Weights in Deep Neural
  Architectures
Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures
Giovanni Bonetta
Matteo Ribero
R. Cancelliere
193
9
0
11 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language
  Models via Attention Guiding
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Zhaochun Ren
Huasheng Liang
Qiang Yan
Sudipta Singha Roy
130
10
0
06 Apr 2022
Probing Structured Pruning on Multilingual Pre-trained Models: Settings,
  Algorithms, and Efficiency
Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and EfficiencyAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yanyang Li
Fuli Luo
Runxin Xu
Songfang Huang
Fei Huang
Liwei Wang
175
3
0
06 Apr 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine
  Translation
CipherDAug: Ciphertext based Data Augmentation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Nishant Kambhatla
Logan Born
Anoop Sarkar
192
18
0
01 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Structured Pruning Learns Compact and Accurate ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
390
221
0
01 Apr 2022
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
TextPruner: A Model Pruning Toolkit for Pre-Trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Ziqing Yang
Yiming Cui
Zhigang Chen
SyDaVLM
159
15
0
30 Mar 2022
Fine-Grained Visual Entailment
Fine-Grained Visual EntailmentEuropean Conference on Computer Vision (ECCV), 2022
Christopher Thomas
Yipeng Zhang
Shih-Fu Chang
336
7
0
29 Mar 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for TransformersNeural Information Processing Systems (NeurIPS), 2022
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
251
208
0
29 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token SelectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
193
21
0
27 Mar 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented
  Languages and Dialects in Indonesia
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in IndonesiaAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
228
135
0
24 Mar 2022
Input-specific Attention Subnetworks for Adversarial Detection
Input-specific Attention Subnetworks for Adversarial DetectionFindings (Findings), 2022
Emil Biju
Anirudh Sriram
Pratyush Kumar
Mitesh M Khapra
AAML
171
5
0
23 Mar 2022
Training-free Transformer Architecture Search
Training-free Transformer Architecture SearchComputer Vision and Pattern Recognition (CVPR), 2022
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
195
57
0
23 Mar 2022
Task-guided Disentangled Tuning for Pretrained Language Models
Task-guided Disentangled Tuning for Pretrained Language ModelsFindings (Findings), 2022
Jiali Zeng
Yu Jiang
Shuangzhi Wu
Yongjing Yin
Mu Li
DRL
367
3
0
22 Mar 2022
Word Order Does Matter (And Shuffled Language Models Know It)
Word Order Does Matter (And Shuffled Language Models Know It)Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Vinit Ravishankar
Mostafa Abdou
Artur Kulmizev
Anders Søgaard
209
51
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
308
7
0
20 Mar 2022
Gaussian Multi-head Attention for Simultaneous Machine Translation
Gaussian Multi-head Attention for Simultaneous Machine TranslationFindings (Findings), 2022
Shaolei Zhang
Yang Feng
160
26
0
17 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for
  Large Language Models
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLMMQMedIm
400
148
0
14 Mar 2022
A Novel Perspective to Look At Attention: Bi-level Attention-based
  Explainable Topic Modeling for News Classification
A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News ClassificationFindings (Findings), 2022
Dairui Liu
Derek Greene
Ruihai Dong
320
14
0
14 Mar 2022
Visualizing and Understanding Patch Interactions in Vision Transformer
Visualizing and Understanding Patch Interactions in Vision TransformerIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Jie Ma
Yalong Bai
Bineng Zhong
Wei Zhang
Ting Yao
Tao Mei
ViT
183
54
0
11 Mar 2022
Data-Efficient Structured Pruning via Submodular Optimization
Data-Efficient Structured Pruning via Submodular OptimizationNeural Information Processing Systems (NeurIPS), 2022
Marwa El Halabi
Suraj Srinivas
Damien Scieur
428
23
0
09 Mar 2022
Understanding microbiome dynamics via interpretable graph representation
  learning
Understanding microbiome dynamics via interpretable graph representation learningScientific Reports (Sci Rep), 2022
K. Melnyk
Kuba Weimann
Tim Conrad
238
7
0
02 Mar 2022
XAI for Transformers: Better Explanations through Conservative
  Propagation
XAI for Transformers: Better Explanations through Conservative PropagationInternational Conference on Machine Learning (ICML), 2022
Ameen Ali
Thomas Schnake
Oliver Eberle
G. Montavon
Klaus-Robert Muller
Lior Wolf
FAtt
346
133
0
15 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language
  Models
A Survey on Model Compression and Acceleration for Pretrained Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022
Canwen Xu
Julian McAuley
365
89
0
15 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image CaptioningNeurocomputing (Neurocomputing), 2022
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLMViT
230
23
0
11 Feb 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for
  Training Large Transformer Models
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer ModelsInternational Conference on Learning Representations (ICLR), 2022
Chen Liang
Haoming Jiang
Simiao Zuo
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
T. Zhao
216
17
0
06 Feb 2022
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for
  Distilling Large Language Models
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu
Subhabrata Mukherjee
Xiaodong Liu
Debadeepta Dey
Wenhui Wang
Xiang Zhang
Ahmed Hassan Awadallah
Jianfeng Gao
210
5
0
29 Jan 2022
Rethinking Attention-Model Explainability through Faithfulness Violation
  Test
Rethinking Attention-Model Explainability through Faithfulness Violation TestInternational Conference on Machine Learning (ICML), 2022
Zichen Liu
Haoliang Li
Yangyang Guo
Chen Kong
Jing Li
Shiqi Wang
FAtt
351
57
0
28 Jan 2022
Can Model Compression Improve NLP Fairness
Can Model Compression Improve NLP Fairness
Guangxuan Xu
Qingyuan Hu
170
30
0
21 Jan 2022
Latency Adjustable Transformer Encoder for Language Understanding
Latency Adjustable Transformer Encoder for Language UnderstandingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Sajjad Kachuee
M. Sharifkhani
603
1
0
10 Jan 2022
Intelligent Online Selling Point Extraction for E-Commerce
  Recommendation
Intelligent Online Selling Point Extraction for E-Commerce Recommendation
Xiaojie Guo
Shugen Wang
Hanqing Zhao
Shiliang Diao
Jiajia Chen
...
Zhen He
Yun Xiao
Bo Long
Han Yu
Lingfei Wu
149
18
0
16 Dec 2021
Sparse Interventions in Language Models with Differentiable Masking
Sparse Interventions in Language Models with Differentiable Masking
Nicola De Cao
Leon Schmid
Dieuwke Hupkes
Ivan Titov
250
33
0
13 Dec 2021
On the Compression of Natural Language Models
On the Compression of Natural Language Models
S. Damadi
119
0
0
13 Dec 2021
Human Guided Exploitation of Interpretable Attention Patterns in
  Summarization and Topic Segmentation
Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation
Raymond Li
Wen Xiao
Linzi Xing
Lanjun Wang
Gabriel Murray
Giuseppe Carenini
ViT
322
10
0
10 Dec 2021
Explainable Deep Learning in Healthcare: A Methodological Survey from an
  Attribution View
Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution ViewWIREs Mechanisms of Disease (WIREs Mech Dis), 2021
Di Jin
Elena Sergeeva
W. Weng
Geeticka Chauhan
Peter Szolovits
OOD
299
77
0
05 Dec 2021
Can depth-adaptive BERT perform better on binary classification tasks
Can depth-adaptive BERT perform better on binary classification tasks
Jing Fan
Xin Zhang
Sheng Zhang
Yan Pan
Lixiang Guo
MQ
190
0
0
22 Nov 2021
Does BERT look at sentiment lexicon?
Does BERT look at sentiment lexicon?International Joint Conference on the Analysis of Images, Social Networks and Texts (AISNT), 2021
E. Razova
S. Vychegzhanin
Evgeny Kotelnikov
182
3
0
19 Nov 2021
Local Multi-Head Channel Self-Attention for Facial Expression
  Recognition
Local Multi-Head Channel Self-Attention for Facial Expression Recognition
Roberto Pecoraro
Valerio Basile
Viviana Bono
Sara Gallo
ViT
331
63
0
14 Nov 2021
A Survey on Green Deep Learning
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
470
98
0
08 Nov 2021
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
  Framework
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
293
50
0
07 Nov 2021
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström
  Method
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström MethodNeural Information Processing Systems (NeurIPS), 2021
Yifan Chen
Qi Zeng
Heng Ji
Yun Yang
261
65
0
29 Oct 2021
Previous
123...91011...131415
Next
Page 10 of 15
Pageof 15