v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 743 papers shown

Acceptability Judgements via Examining the Topology of Attention MapsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

972

19 May 2022

Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

202

14 May 2022

A Study of the Attention Abnormality in Trojaned BERTsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

374

13 May 2022

EigenNoise: A Contrastive Prior to Warm-Start Representations

H. Heidenreich

Jake Williams

135

09 May 2022

Knowledge Distillation of Russian Language Models with Reduction of VocabularyComputational Linguistics and Intellectual Technologies (CLIT), 2022

127

04 May 2022

Adaptable AdaptersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

235

03 May 2022

Visualizing and Explaining Language Models

Adrian M. P. Braşoveanu

Razvan Andonie

MILM VLM

336

30 Apr 2022

Attention Mechanism in Neural Networks: Where it Comes and Where it Goes

Derya Soydaner

3DV

285

308

27 Apr 2022

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Zhijian Liu

Song Han

264

133

25 Apr 2022

Merging of neural networksNeural Processing Letters (NPL), 2022

Martin Pasen

Vladimír Boza

FedML MoMe

214

21 Apr 2022

Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures

Giovanni Bonetta

Matteo Ribero

R. Cancelliere

193

11 Apr 2022

Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding

130

06 Apr 2022

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and EfficiencyAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Runxin Xu

Fei Huang

175

06 Apr 2022

CipherDAug: Ciphertext based Data Augmentation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Nishant Kambhatla

Logan Born

Anoop Sarkar

192

01 Apr 2022

Structured Pruning Learns Compact and Accurate ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

390

221

01 Apr 2022

TextPruner: A Model Pruning Toolkit for Pre-Trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

159

30 Mar 2022

Fine-Grained Visual EntailmentEuropean Conference on Computer Vision (ECCV), 2022

Christopher Thomas

Yipeng Zhang

Shih-Fu Chang

336

29 Mar 2022

A Fast Post-Training Pruning Framework for TransformersNeural Information Processing Systems (NeurIPS), 2022

Sehoon Kim

251

208

29 Mar 2022

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token SelectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

193

27 Mar 2022

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in IndonesiaAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

228

135

24 Mar 2022

Input-specific Attention Subnetworks for Adversarial DetectionFindings (Findings), 2022

Mitesh M Khapra

171

23 Mar 2022

Training-free Transformer Architecture SearchComputer Vision and Pattern Recognition (CVPR), 2022

Yonghong Tian

Jie Chen

Rongrong Ji

ViT

195

23 Mar 2022

Task-guided Disentangled Tuning for Pretrained Language ModelsFindings (Findings), 2022

367

22 Mar 2022

Word Order Does Matter (And Shuffled Language Models Know It)Annual Meeting of the Association for Computational Linguistics (ACL), 2022

209

21 Mar 2022

Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention

Zuzana Jelčicová

Marian Verhelst

308

20 Mar 2022

Gaussian Multi-head Attention for Simultaneous Machine TranslationFindings (Findings), 2022

Shaolei Zhang

Yang Feng

160

17 Mar 2022

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Eldar Kurtic

Daniel Fernando Campos

Michael Goin

Dan Alistarh

VLM MQ MedIm

400

148

14 Mar 2022

A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News ClassificationFindings (Findings), 2022

Dairui Liu

Derek Greene

Ruihai Dong

320

14 Mar 2022

Visualizing and Understanding Patch Interactions in Vision TransformerIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

Tao Mei

183

11 Mar 2022

Data-Efficient Structured Pruning via Submodular OptimizationNeural Information Processing Systems (NeurIPS), 2022

Marwa El Halabi

Suraj Srinivas

Damien Scieur

428

09 Mar 2022

Understanding microbiome dynamics via interpretable graph representation learningScientific Reports (Sci Rep), 2022

K. Melnyk

Kuba Weimann

Tim Conrad

238

02 Mar 2022

XAI for Transformers: Better Explanations through Conservative PropagationInternational Conference on Machine Learning (ICML), 2022

346

133

15 Feb 2022

A Survey on Model Compression and Acceleration for Pretrained Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022

Canwen Xu

Julian McAuley

365

15 Feb 2022

ACORT: A Compact Object Relation Transformer for Parameter Efficient Image CaptioningNeurocomputing (Neurocomputing), 2022

230

11 Feb 2022

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer ModelsInternational Conference on Learning Representations (ICLR), 2022

Xiaodong Liu

216

06 Feb 2022

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

Dongkuan Xu

Xiaodong Liu

Ahmed Hassan Awadallah

Jianfeng Gao

210

29 Jan 2022

Rethinking Attention-Model Explainability through Faithfulness Violation TestInternational Conference on Machine Learning (ICML), 2022

Shiqi Wang

351

28 Jan 2022

Can Model Compression Improve NLP Fairness

Guangxuan Xu

Qingyuan Hu

170

21 Jan 2022

Latency Adjustable Transformer Encoder for Language UnderstandingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

Sajjad Kachuee

M. Sharifkhani

603

10 Jan 2022

Intelligent Online Selling Point Extraction for E-Commerce Recommendation

...

149

16 Dec 2021

Sparse Interventions in Language Models with Differentiable Masking

250

13 Dec 2021

On the Compression of Natural Language Models

S. Damadi

119

13 Dec 2021

Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation

322

10 Dec 2021

Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution ViewWIREs Mechanisms of Disease (WIREs Mech Dis), 2021

299

05 Dec 2021

Can depth-adaptive BERT perform better on binary classification tasks

190

22 Nov 2021

Does BERT look at sentiment lexicon?International Joint Conference on the Analysis of Images, Social Networks and Texts (AISNT), 2021

E. Razova

S. Vychegzhanin

Evgeny Kotelnikov

182

19 Nov 2021

Local Multi-Head Channel Self-Attention for Facial Expression Recognition

331

14 Nov 2021

A Survey on Green Deep Learning

Lei Li

470

08 Nov 2021

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

293

07 Nov 2021

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström MethodNeural Information Processing Systems (NeurIPS), 2021

Yifan Chen

Qi Zeng

Heng Ji

Yun Yang

261

29 Oct 2021