v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 741 papers shown

Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis

Yao Fu

198

14 May 2024

Improving Transformers with Dynamically Composable Multi-Head AttentionInternational Conference on Machine Learning (ICML), 2024

291

14 May 2024

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Michael Goin

...

Dan Alistarh

288

06 May 2024

Structural Pruning of Pre-trained Language Models via Neural Architecture Search

209

03 May 2024

Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers

Kabir Ahuja

Vidhisha Balachandran

Madhur Panwar

Tianxing He

Noah A. Smith

Navin Goyal

Yulia Tsvetkov

290

25 Apr 2024

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

291

12 Apr 2024

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

Igor Tufanov

248

10 Apr 2024

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

270

08 Apr 2024

F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation

190

07 Apr 2024

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

340

04 Apr 2024

CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

02 Apr 2024

On the Faithfulness of Vision Transformer Explanations

Yan Yan

251

01 Apr 2024

Efficiently Distilling LLMs for Edge Applications

223

01 Apr 2024

The Unreasonable Ineffectiveness of the Deeper Layers

434

158

26 Mar 2024

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Yan Yan

219

21 Mar 2024

SEVEN: Pruning Transformer Model by Reserving SentinelsIEEE International Joint Conference on Neural Network (IJCNN), 2024

203

19 Mar 2024

FBPT: A Fully Binary Point TransformerIEEE International Conference on Robotics and Automation (ICRA), 2024

Zhixing Hou

Yuzhang Shang

Yan Yan

233

15 Mar 2024

The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models

168

13 Mar 2024

CHAI: Clustered Head Attention for Efficient LLM InferenceInternational Conference on Machine Learning (ICML), 2024

Saurabh Agarwal

Shivaram Venkataraman

Dimitris Papailiopoulos

Carole-Jean Wu

257

12 Mar 2024

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error MetricComputer Vision and Pattern Recognition (CVPR), 2024

167

12 Mar 2024

Explainable Learning with Gaussian Processes

Kurt Butler

Guanchao Feng

Petar M. Djurić

330

11 Mar 2024

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

422

130

08 Mar 2024

Where does In-context Translation Happen in Large Language Models

259

07 Mar 2024

Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation

336

03 Mar 2024

Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models

Arijit Ghosh Chowdhury

Vinija Jain

289

03 Mar 2024

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

352

02 Mar 2024

Dissecting Language Models: Machine Unlearning via Selective Pruning

Nicholas Pochinkov

Nandi Schoots

MILM MU

239

02 Mar 2024

Evaluating Webcam-based Gaze Data as an Alternative for Human Rationale Annotations

212

29 Feb 2024

NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models

369

28 Feb 2024

Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey

232

27 Feb 2024

Information Flow Routes: Automatically Interpreting Language Models at Scale

Javier Ferrando

Elena Voita

388

27 Feb 2024

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization

Vahab Mirrokni

439

27 Feb 2024

Tiny Reinforcement Learning for Quadruped Locomotion using Decision Transformers

243

20 Feb 2024

Model Compression and Efficient Inference for Large Language Models: A Survey

301

15 Feb 2024

Spectral Filters, Dark Signals, and Attention Sinks

Nicola Cancedda

232

14 Feb 2024

Task-conditioned adaptation of visual features in multi-task policy learning

392

12 Feb 2024

Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

Saebom Leem

Hyunseok Seo

ViT

153

07 Feb 2024

A Survey on Transformer Compression

487

05 Feb 2024

Approximate Attributions for Off-the-Shelf Siamese TransformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

Lucas Moller

Dmitry Nikolaev

Sebastian Padó

220

05 Feb 2024

Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods

Tae-Ho Kim

309

05 Feb 2024

From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers

419

02 Feb 2024

SHViT: Single-Head Vision Transformer with Memory Efficient Macro DesignComputer Vision and Pattern Recognition (CVPR), 2024

Seokju Yun

Youngmin Ro

ViT

411

29 Jan 2024

EAGLE: Speculative Sampling Requires Rethinking Feature UncertaintyInternational Conference on Machine Learning (ICML), 2024

627

323

26 Jan 2024

Dynamic Layer Tying for Parameter-Efficient TransformersInternational Conference on Learning Representations (ICLR), 2024

Tamir David Hay

Lior Wolf

171

23 Jan 2024

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM

Hongwu Peng

Caiwen Ding

149

22 Jan 2024

Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric

Golara Javadi

K. Yuksel

Yunsu Kim

Thiago Castro Ferreira

Mohamed Al-Badrashiny

302

20 Jan 2024

LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation

Navin Ranjan

Andreas E. Savakis

233

20 Jan 2024

Understanding Video Transformers via Universal Concept Discovery

M. Kowal

Achal Dave

Rares Andrei Ambrus

Adrien Gaidon

Konstantinos G. Derpanis

P. Tokmakov

ViT

426

19 Jan 2024

When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges

Wang Chao

Jiaxuan Zhao

Licheng Jiao

Lingling Li

Fang Liu

Shuyuan Yang

482

19 Jan 2024

Better Explain Transformers by Illuminating Important Information

318

18 Jan 2024