Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.09418
Cited By
v1
v2 (latest)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 742 papers shown
Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Gerard Sant
Gerard I. Gállego
Belen Alastruey
Marta R. Costa-jussá
172
4
0
14 May 2022
A Study of the Attention Abnormality in Trojaned BERTs
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Weimin Lyu
Songzhu Zheng
Teng Ma
Chao Chen
295
66
0
13 May 2022
EigenNoise: A Contrastive Prior to Warm-Start Representations
H. Heidenreich
Jake Williams
131
1
0
09 May 2022
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary
Computational Linguistics and Intellectual Technologies (CLIT), 2022
A. Kolesnikova
Yuri Kuratov
Vasily Konovalov
Andrey Kravchenko
VLM
124
12
0
04 May 2022
Adaptable Adapters
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
203
20
0
03 May 2022
Visualizing and Explaining Language Models
Adrian M. P. Braşoveanu
Razvan Andonie
MILM
VLM
326
7
0
30 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
280
297
0
27 Apr 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
259
133
0
25 Apr 2022
Merging of neural networks
Neural Processing Letters (NPL), 2022
Martin Pasen
Vladimír Boza
FedML
MoMe
186
3
0
21 Apr 2022
Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures
Giovanni Bonetta
Matteo Ribero
R. Cancelliere
187
9
0
11 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Zhaochun Ren
Huasheng Liang
Qiang Yan
Sudipta Singha Roy
126
10
0
06 Apr 2022
Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yanyang Li
Fuli Luo
Runxin Xu
Songfang Huang
Fei Huang
Liwei Wang
157
3
0
06 Apr 2022
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Nishant Kambhatla
Logan Born
Anoop Sarkar
190
18
0
01 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
387
214
0
01 Apr 2022
TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Ziqing Yang
Yiming Cui
Zhigang Chen
SyDa
VLM
159
14
0
30 Mar 2022
Fine-Grained Visual Entailment
European Conference on Computer Vision (ECCV), 2022
Christopher Thomas
Yipeng Zhang
Shih-Fu Chang
333
7
0
29 Mar 2022
A Fast Post-Training Pruning Framework for Transformers
Neural Information Processing Systems (NeurIPS), 2022
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
243
206
0
29 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
183
20
0
27 Mar 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
226
133
0
24 Mar 2022
Input-specific Attention Subnetworks for Adversarial Detection
Findings (Findings), 2022
Emil Biju
Anirudh Sriram
Pratyush Kumar
Mitesh M Khapra
AAML
158
5
0
23 Mar 2022
Training-free Transformer Architecture Search
Computer Vision and Pattern Recognition (CVPR), 2022
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
183
56
0
23 Mar 2022
Task-guided Disentangled Tuning for Pretrained Language Models
Findings (Findings), 2022
Jiali Zeng
Yu Jiang
Shuangzhi Wu
Yongjing Yin
Mu Li
DRL
298
3
0
22 Mar 2022
Word Order Does Matter (And Shuffled Language Models Know It)
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Vinit Ravishankar
Mostafa Abdou
Artur Kulmizev
Anders Søgaard
196
50
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
288
7
0
20 Mar 2022
Gaussian Multi-head Attention for Simultaneous Machine Translation
Findings (Findings), 2022
Shaolei Zhang
Yang Feng
150
26
0
17 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLM
MQ
MedIm
395
146
0
14 Mar 2022
A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News Classification
Findings (Findings), 2022
Dairui Liu
Derek Greene
Ruihai Dong
242
14
0
14 Mar 2022
Visualizing and Understanding Patch Interactions in Vision Transformer
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Jie Ma
Yalong Bai
Bineng Zhong
Wei Zhang
Ting Yao
Tao Mei
ViT
147
52
0
11 Mar 2022
Data-Efficient Structured Pruning via Submodular Optimization
Neural Information Processing Systems (NeurIPS), 2022
Marwa El Halabi
Suraj Srinivas
Damien Scieur
410
22
0
09 Mar 2022
Understanding microbiome dynamics via interpretable graph representation learning
Scientific Reports (Sci Rep), 2022
K. Melnyk
Kuba Weimann
Tim Conrad
212
7
0
02 Mar 2022
XAI for Transformers: Better Explanations through Conservative Propagation
International Conference on Machine Learning (ICML), 2022
Ameen Ali
Thomas Schnake
Oliver Eberle
G. Montavon
Klaus-Robert Muller
Lior Wolf
FAtt
332
128
0
15 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2022
Canwen Xu
Julian McAuley
359
87
0
15 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
Neurocomputing (Neurocomputing), 2022
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
206
23
0
11 Feb 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
International Conference on Learning Representations (ICLR), 2022
Chen Liang
Haoming Jiang
Simiao Zuo
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
T. Zhao
182
17
0
06 Feb 2022
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu
Subhabrata Mukherjee
Xiaodong Liu
Debadeepta Dey
Wenhui Wang
Xiang Zhang
Ahmed Hassan Awadallah
Jianfeng Gao
202
5
0
29 Jan 2022
Rethinking Attention-Model Explainability through Faithfulness Violation Test
International Conference on Machine Learning (ICML), 2022
Zichen Liu
Haoliang Li
Yangyang Guo
Chen Kong
Jing Li
Shiqi Wang
FAtt
328
57
0
28 Jan 2022
Can Model Compression Improve NLP Fairness
Guangxuan Xu
Qingyuan Hu
146
30
0
21 Jan 2022
Latency Adjustable Transformer Encoder for Language Understanding
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Sajjad Kachuee
M. Sharifkhani
590
1
0
10 Jan 2022
Intelligent Online Selling Point Extraction for E-Commerce Recommendation
Xiaojie Guo
Shugen Wang
Hanqing Zhao
Shiliang Diao
Jiajia Chen
...
Zhen He
Yun Xiao
Bo Long
Han Yu
Lingfei Wu
144
18
0
16 Dec 2021
Sparse Interventions in Language Models with Differentiable Masking
Nicola De Cao
Leon Schmid
Dieuwke Hupkes
Ivan Titov
185
32
0
13 Dec 2021
On the Compression of Natural Language Models
S. Damadi
92
0
0
13 Dec 2021
Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation
Raymond Li
Wen Xiao
Linzi Xing
Lanjun Wang
Gabriel Murray
Giuseppe Carenini
ViT
266
9
0
10 Dec 2021
Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View
WIREs Mechanisms of Disease (WIREs Mech Dis), 2021
Di Jin
Elena Sergeeva
W. Weng
Geeticka Chauhan
Peter Szolovits
OOD
287
76
0
05 Dec 2021
Can depth-adaptive BERT perform better on binary classification tasks
Jing Fan
Xin Zhang
Sheng Zhang
Yan Pan
Lixiang Guo
MQ
183
0
0
22 Nov 2021
Does BERT look at sentiment lexicon?
International Joint Conference on the Analysis of Images, Social Networks and Texts (AISNT), 2021
E. Razova
S. Vychegzhanin
Evgeny Kotelnikov
181
3
0
19 Nov 2021
Local Multi-Head Channel Self-Attention for Facial Expression Recognition
Roberto Pecoraro
Valerio Basile
Viviana Bono
Sara Gallo
ViT
316
62
0
14 Nov 2021
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
457
94
0
08 Nov 2021
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
280
50
0
07 Nov 2021
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method
Neural Information Processing Systems (NeurIPS), 2021
Yifan Chen
Qi Zeng
Heng Ji
Yun Yang
227
63
0
29 Oct 2021
Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures
Archit Parnami
Rahul Singh
Tarun Joshi
232
7
0
28 Oct 2021
Previous
1
2
3
...
9
10
11
...
13
14
15
Next
Page 10 of 15
Page
of 15
Go