ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.09418
  4. Cited By
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
ArXiv (abs)PDFHTML

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 741 papers shown
Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating
  Attention Head Activation Patterns
Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patterns
Yang Zhao
Li Du
Xiao Ding
Kai Xiong
Ting Liu
Bing Qin
276
0
0
24 Sep 2024
ViTGuard: Attention-aware Detection against Adversarial Examples for
  Vision Transformer
ViTGuard: Attention-aware Detection against Adversarial Examples for Vision TransformerAsia-Pacific Computer Systems Architecture Conference (ACSA), 2024
Shihua Sun
Kenechukwu Nwodo
Shridatt Sugrim
Angelos Stavrou
Haining Wang
AAML
301
3
0
20 Sep 2024
Localized Gaussians as Self-Attention Weights for Point Clouds
  Correspondence
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence
Alessandro Riva
Alessandro Raganato
Simone Melzi
3DPC
209
1
0
20 Sep 2024
CFSP: An Efficient Structured Pruning Framework for LLMs with
  Coarse-to-Fine Activation Information
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation InformationInternational Conference on Computational Linguistics (COLING), 2024
Yuxin Wang
Minghua Ma
Zekun Wang
Jingchang Chen
Huiming Fan
Liping Shan
Qing Yang
Dongliang Xu
Ming Liu
Bing Qin
178
6
0
20 Sep 2024
ART: Artifact Removal Transformer for Reconstructing Noise-Free
  Multichannel Electroencephalographic Signals
ART: Artifact Removal Transformer for Reconstructing Noise-Free Multichannel Electroencephalographic Signals
Chun-Hsiang Chuang
Kong-Yi Chang
Chih-Sheng Huang
Anne-Mei Bessas
111
2
0
11 Sep 2024
AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile
  Image Restoration
AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration
Hongyi Cai
Mohammad Mahdinur Rahman
Mohammad Shahid Akhtar
Jie Li
Jingyu Wu
Zhili Fang
168
1
0
10 Sep 2024
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
STUN: Structured-Then-Unstructured Pruning for Scalable MoE PruningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jaeseong Lee
Seung-won Hwang
Aurick Qiao
Daniel F Campos
Z. Yao
Yuxiong He
288
10
0
10 Sep 2024
Attention Heads of Large Language Models: A Survey
Attention Heads of Large Language Models: A SurveyPatterns (Patterns), 2024
Zifan Zheng
Yezhaohui Wang
Yuxin Huang
Chenyang Xi
Junchi Yan
Bo Tang
Feiyu Xiong
Zhiyu Li
LRM
287
65
0
05 Sep 2024
Neural HD Map Generation from Multiple Vectorized Tiles Locally Produced
  by Autonomous Vehicles
Neural HD Map Generation from Multiple Vectorized Tiles Locally Produced by Autonomous VehiclesInternational Conference on Spatial Data and Intelligence (ICSDI), 2024
Miao Fan
Yi Yao
Jianping Zhang
Xiangbo Song
Daihui Wu
128
6
0
05 Sep 2024
Collaborative Learning for Enhanced Unsupervised Domain Adaptation
Collaborative Learning for Enhanced Unsupervised Domain Adaptation
Minhee Cho
Hyesong Choi
Hayeon Jo
Dongbo Min
349
1
0
04 Sep 2024
Explainable Artificial Intelligence: A Survey of Needs, Techniques, Applications, and Future Direction
Explainable Artificial Intelligence: A Survey of Needs, Techniques, Applications, and Future Direction
Melkamu Mersha
Khang Lam
Joseph Wood
Ali AlShami
Jugal Kalita
XAIAI4TS
711
94
0
30 Aug 2024
MPruner: Optimizing Neural Network Size with CKA-Based Mutual
  Information Pruning
MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning
Seungbeom Hu
ChanJun Park
Andrew Ferraiuolo
Sang-Ki Ko
Jinwoo Kim
Haein Song
Jieung Kim
353
2
0
24 Aug 2024
An alternative formulation of attention pooling function in translation
An alternative formulation of attention pooling function in translation
Eddie Conti
143
1
0
23 Aug 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging
  Framework And Methods From Neuroscience
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
...
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
327
6
0
22 Aug 2024
Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune
  CNNs and Transformers
Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Reduan Achtibat
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
ViT
235
10
0
22 Aug 2024
On Learnable Parameters of Optimal and Suboptimal Deep Learning Models
On Learnable Parameters of Optimal and Suboptimal Deep Learning ModelsInternational Conference on Neural Information Processing (ICONIP), 2024
Ziwei Zheng
Huizhi Liang
V. Snás̃el
Vito Latora
Panos Pardalos
Giuseppe Nicosia
Varun Ojha
126
2
0
21 Aug 2024
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
798
7
0
17 Aug 2024
Deep-change at AXOLOTL-24: Orchestrating WSD and WSI Models for Semantic
  Change Modeling
Deep-change at AXOLOTL-24: Orchestrating WSD and WSI Models for Semantic Change ModelingWorkshop on Computational Approaches to Historical Language Change (CAHLC), 2024
Denis Kokosinskii
Mikhail Kuklin
Nikolay Arefyev
223
1
0
09 Aug 2024
The Mechanics of Conceptual Interpretation in GPT Models: Interpretative
  Insights
The Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insights
Nura Aljaafari
Danilo S. Carvalho
André Freitas
KELM
144
3
0
05 Aug 2024
Cross-layer Attention Sharing for Pre-trained Large Language Models
Cross-layer Attention Sharing for Pre-trained Large Language Models
Yongyu Mu
Yuzhang Wu
Yuchun Fan
Chenglong Wang
Hengyu Li
...
Murun Yang
Fandong Meng
Jie Zhou
Tong Xiao
Jingbo Zhu
286
6
0
04 Aug 2024
Empowering Clinicians with Medical Decision Transformers: A Framework
  for Sepsis Treatment
Empowering Clinicians with Medical Decision Transformers: A Framework for Sepsis Treatment
A. Rahman
Pranav Agarwal
R. Noumeir
P. Jouvet
Vincent Michalski
Samira Ebrahimi Kahou
OffRL
263
3
0
28 Jul 2024
Efficient LLM Training and Serving with Heterogeneous Context Sharding
  among Attention Heads
Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads
Xihui Lin
Yunan Zhang
Suyu Ge
Barun Patra
Vishrav Chaudhary
Hao Peng
Xia Song
151
0
0
25 Jul 2024
Reconstruct the Pruned Model without Any Retraining
Reconstruct the Pruned Model without Any Retraining
Pingjie Wang
Ziqing Fan
Shengchao Hu
Zhe Chen
Yanfeng Wang
Yu Wang
222
2
0
18 Jul 2024
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference
Ghadeer Jaradat
M. Tolba
Ghada Alsuhli
Hani Saleh
Mahmoud Al-Qutayri
Thanos Stouraitis
Baker Mohammad
145
1
0
17 Jul 2024
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Yuan Feng
Junlin Lv
Yukun Cao
Xike Xie
S. K. Zhou
VLM
532
89
0
16 Jul 2024
Isomorphic Pruning for Vision Models
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLMViT
258
20
0
05 Jul 2024
Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
Lei Yu
Jingcheng Niu
Zining Zhu
Xi Chen
Gerald Penn
214
9
0
04 Jul 2024
Croppable Knowledge Graph Embedding
Croppable Knowledge Graph Embedding
Yushan Zhu
Wen Zhang
Zhiqiang Liu
Yin Hua
Lei Liang
H. Chen
278
0
0
03 Jul 2024
Reasoning in Large Language Models: A Geometric Perspective
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
222
3
0
02 Jul 2024
Interpreting Attention Layer Outputs with Sparse Autoencoders
Interpreting Attention Layer Outputs with Sparse Autoencoders
Connor Kissane
Robert Krzyzanowski
Joseph Isaac Bloom
Arthur Conmy
Neel Nanda
MILM
264
37
0
25 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
198
16
0
19 Jun 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal
  Component Analysis
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
R. Teo
Tan M. Nguyen
357
7
0
19 Jun 2024
When Parts are Greater Than Sums: Individual LLM Components Can
  Outperform Full Models
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ting-Yun Chang
Jesse Thomason
Robin Jia
419
6
0
19 Jun 2024
InternalInspector $I^2$: Robust Confidence Estimation in LLMs through
  Internal States
InternalInspector I2I^2I2: Robust Confidence Estimation in LLMs through Internal States
Mohammad Beigi
Ying Shen
Runing Yang
Zihao Lin
Qifan Wang
Ankith Mohan
Jianfeng He
Ming Jin
Chang-Tien Lu
Lifu Huang
HILM
254
20
0
17 Jun 2024
Inpainting the Gaps: A Novel Framework for Evaluating Explanation
  Methods in Vision Transformers
Inpainting the Gaps: A Novel Framework for Evaluating Explanation Methods in Vision Transformers
Lokesh Badisa
Sumohana S. Channappayya
292
1
0
17 Jun 2024
CrAM: Credibility-Aware Attention Modification in LLMs for Combating
  Misinformation in RAG
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Boyi Deng
Wenjie Wang
Fengbin Zhu
Qifan Wang
Fuli Feng
246
19
0
17 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
Korbinian Riedhammer
Tobias Bocklet
216
4
0
16 Jun 2024
Investigating the translation capabilities of Large Language Models
  trained on parallel data only
Investigating the translation capabilities of Large Language Models trained on parallel data only
Javier García Gilabert
Carlos Escolano
Aleix Sant Savall
Francesca de Luca Fornaciari
Audrey Mash
Xixian Liao
Maite Melero
LRM
320
2
0
13 Jun 2024
Analyzing Multi-Head Attention on Trojan BERT Models
Analyzing Multi-Head Attention on Trojan BERT Models
Jingwei Wang
181
0
0
12 Jun 2024
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models
Xiang Meng
Kayhan Behdin
Haoyue Wang
Rahul Mazumder
274
13
0
12 Jun 2024
Attention as a Hypernetwork
Attention as a HypernetworkInternational Conference on Learning Representations (ICLR), 2024
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
304
9
0
09 Jun 2024
VTrans: Accelerating Transformer Compression with Variational
  Information Bottleneck based Pruning
VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
Oshin Dutta
Ritvik Gupta
Sumeet Agarwal
331
4
0
07 Jun 2024
Enhancing In-Context Learning Performance with just SVD-Based Weight
  Pruning: A Theoretical Perspective
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective
Xinhao Yao
Xiaolin Hu
Shenzhi Yang
Yong Liu
240
3
0
06 Jun 2024
Interpreting the Second-Order Effects of Neurons in CLIP
Interpreting the Second-Order Effects of Neurons in CLIP
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
MILM
450
32
0
06 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in
  Multimodal Large Language Models
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
216
59
0
31 May 2024
STAT: Shrinking Transformers After Training
STAT: Shrinking Transformers After Training
Megan Flynn
Alexander Wang
Dean Edward Alvarez
Christopher De Sa
Anil Damle
300
3
0
29 May 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics
  Theory of Transformers
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
393
9
0
24 May 2024
Large Language Model Pruning
Large Language Model Pruning
Hanjuan Huang
Hao-Jia Song
H. Pao
416
1
0
24 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and
  Extrapolate
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
336
4
0
22 May 2024
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language ModelsNeural Information Processing Systems (NeurIPS), 2024
R. Sukthanker
Arber Zela
B. Staffler
Aaron Klein
Lennart Purucker
Jorg K. H. Franke
Katharina Eggensperger
ELM
264
7
0
16 May 2024
Previous
12345...131415
Next
Page 4 of 15
Pageof 15