Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.09418
Cited By
v1
v2 (latest)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 741 papers shown
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis
Yao Fu
198
38
0
14 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
International Conference on Machine Learning (ICML), 2024
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
291
5
0
14 May 2024
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Abhinav Agarwalla
Abhay Gupta
Alexandre Marques
Shubhra Pandit
Michael Goin
...
Tuan Nguyen
Mahmoud Salem
Dan Alistarh
Sean Lie
Mark Kurtz
MoE
SyDa
288
17
0
06 May 2024
Structural Pruning of Pre-trained Language Models via Neural Architecture Search
Aaron Klein
Jacek Golebiowski
Xingchen Ma
Valerio Perrone
Cédric Archambeau
209
5
0
03 May 2024
Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers
Kabir Ahuja
Vidhisha Balachandran
Madhur Panwar
Tianxing He
Noah A. Smith
Navin Goyal
Yulia Tsvetkov
290
2
0
25 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
291
31
0
12 Apr 2024
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models
Igor Tufanov
Karen Hambardzumyan
Javier Ferrando
Elena Voita
KELM
248
14
0
10 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Songlin Yang
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Yikang Shen
MoE
270
32
0
08 Apr 2024
F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation
Junhong Wu
Yuchen Liu
Chengqing Zong
CLL
190
5
0
07 Apr 2024
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham
Angie Boggust
Sofian Chaybouti
Hendrik Strobelt
Hilde Kuehne
340
24
0
04 Apr 2024
CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference
Ruqi Liao
Chuqing Zhao
Jin Li
Weiqi Feng
68
0
0
02 Apr 2024
On the Faithfulness of Vision Transformer Explanations
Junyi Wu
Weitai Kang
Hao Tang
Yuan Hong
Yan Yan
251
10
0
01 Apr 2024
Efficiently Distilling LLMs for Edge Applications
Achintya Kundu
Fabian Lim
Aaron Chew
L. Wynter
Penny Chong
Rhui Dih Lee
223
10
0
01 Apr 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
434
158
0
26 Mar 2024
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu
Bin Duan
Weitai Kang
Hao Tang
Yan Yan
219
16
0
21 Mar 2024
SEVEN: Pruning Transformer Model by Reserving Sentinels
IEEE International Joint Conference on Neural Network (IJCNN), 2024
Jinying Xiao
Ping Li
Jie Nie
Zhe Tang
203
3
0
19 Mar 2024
FBPT: A Fully Binary Point Transformer
IEEE International Conference on Robotics and Automation (ICRA), 2024
Zhixing Hou
Yuzhang Shang
Yan Yan
MQ
233
1
0
15 Mar 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
168
1
0
13 Mar 2024
CHAI: Clustered Head Attention for Efficient LLM Inference
International Conference on Machine Learning (ICML), 2024
Saurabh Agarwal
Bilge Acun
Basil Homer
Mostafa Elhoushi
Yejin Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
Carole-Jean Wu
257
12
0
12 Mar 2024
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Computer Vision and Pattern Recognition (CVPR), 2024
Haokun Lin
Haoli Bai
Zhili Liu
Lu Hou
Muyi Sun
Linqi Song
Ying Wei
Zhenan Sun
CLIP
VLM
167
35
0
12 Mar 2024
Explainable Learning with Gaussian Processes
Kurt Butler
Guanchao Feng
Petar M. Djurić
330
3
0
11 Mar 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Hao Kang
Qingru Zhang
Souvik Kundu
Geonhwa Jeong
Zaoxing Liu
Tushar Krishna
Tuo Zhao
MQ
422
130
0
08 Mar 2024
Where does In-context Translation Happen in Large Language Models
Suzanna Sia
David Mueller
Kevin Duh
LRM
259
1
0
07 Mar 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
336
4
0
03 Mar 2024
Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
Arijit Ghosh Chowdhury
Md. Mofijul Islam
Vaibhav Kumar
F. H. Shezan
Vaibhav Kumar
Vinija Jain
Vasu Sharma
AAML
PILM
289
46
0
03 Mar 2024
OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
Xiang Meng
Shibal Ibrahim
Kayhan Behdin
Hussein Hazimeh
Natalia Ponomareva
Rahul Mazumder
VLM
352
12
0
02 Mar 2024
Dissecting Language Models: Machine Unlearning via Selective Pruning
Nicholas Pochinkov
Nandi Schoots
MILM
MU
239
31
0
02 Mar 2024
Evaluating Webcam-based Gaze Data as an Alternative for Human Rationale Annotations
Stephanie Brandl
Oliver Eberle
Tiago F. R. Ribeiro
Anders Søgaard
Nora Hollenstein
212
3
0
29 Feb 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
Amit Dhurandhar
Tejaswini Pedapati
Ronny Luss
Soham Dan
Aurélie C. Lozano
Payel Das
Georgios Kollias
369
3
0
28 Feb 2024
Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey
Dinh-Viet-Toan Le
Louis Bigo
Mikaela Keller
Dorien Herremans
MedIm
232
29
0
27 Feb 2024
Information Flow Routes: Automatically Interpreting Language Models at Scale
Javier Ferrando
Elena Voita
388
71
0
27 Feb 2024
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
T. Yasuda
Kyriakos Axiotis
Gang Fu
M. Bateni
Vahab Mirrokni
439
2
0
27 Feb 2024
Tiny Reinforcement Learning for Quadruped Locomotion using Decision Transformers
Orhan Eren Akgün
Néstor Cuevas
Matheus Farias
Daniel Garces
243
1
0
20 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
301
90
0
15 Feb 2024
Spectral Filters, Dark Signals, and Attention Sinks
Nicola Cancedda
232
35
0
14 Feb 2024
Task-conditioned adaptation of visual features in multi-task policy learning
Pierre Marza
L. Matignon
Olivier Simonin
Christian Wolf
392
7
0
12 Feb 2024
Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Saebom Leem
Hyunseok Seo
ViT
153
31
0
07 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
487
67
0
05 Feb 2024
Approximate Attributions for Off-the-Shelf Siamese Transformers
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Lucas Moller
Dmitry Nikolaev
Sebastian Padó
220
7
0
05 Feb 2024
Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods
Bo-Kyeong Kim
Geonmin Kim
Tae-Ho Kim
Thibault Castells
Shinkook Choi
Junho Shin
Hyoung-Kyu Song
309
62
0
05 Feb 2024
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers
Bharat Runwal
Tejaswini Pedapati
Pin-Yu Chen
MoE
419
8
0
02 Feb 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Computer Vision and Pattern Recognition (CVPR), 2024
Seokju Yun
Youngmin Ro
ViT
411
98
0
29 Jan 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
International Conference on Machine Learning (ICML), 2024
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
627
323
0
26 Jan 2024
Dynamic Layer Tying for Parameter-Efficient Transformers
International Conference on Learning Representations (ICLR), 2024
Tamir David Hay
Lior Wolf
171
11
0
23 Jan 2024
Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
Bingbing Li
Geng Yuan
Zigeng Wang
Shaoyi Huang
Hongwu Peng
Rohit Das
Wujie Wen
Hang Liu
Caiwen Ding
149
8
0
22 Jan 2024
Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric
Golara Javadi
K. Yuksel
Yunsu Kim
Thiago Castro Ferreira
Mohamed Al-Badrashiny
302
2
0
20 Jan 2024
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation
Navin Ranjan
Andreas E. Savakis
MQ
233
13
0
20 Jan 2024
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Andrei Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
426
17
0
19 Jan 2024
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
Wang Chao
Jiaxuan Zhao
Licheng Jiao
Lingling Li
Fang Liu
Shuyuan Yang
482
22
0
19 Jan 2024
Better Explain Transformers by Illuminating Important Information
Linxin Song
Yan Cui
Ao Luo
Freddy Lecue
Irene Li
FAtt
318
5
0
18 Jan 2024
Previous
1
2
3
4
5
6
...
13
14
15
Next
Page 5 of 15
Page
of 15
Go