v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

23 May 2019

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 741 papers shown

Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck ModelsComputer Vision and Pattern Recognition (CVPR), 2025

Itay Benou

Tammy Riklin-Raviv

574

27 Feb 2025

Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

316

27 Feb 2025

Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs

Angelica I Aviles-Rivero

Chuanlong Xie

Yao Zhu

540

26 Feb 2025

"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts

228

24 Feb 2025

LESA: Learnable LLM Layer Scaling-UpAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

396

20 Feb 2025

EvoP: Robust LLM Inference via Evolutionary Pruning

640

19 Feb 2025

LLMs as a synthesis between symbolic and distributed approaches to language

Gemma Boleda

SyDa

307

17 Feb 2025

Exploring the Translation Mechanism of Large Language Models

416

17 Feb 2025

AI Generations: From AI 1.0 to AI 4.0

219

16 Feb 2025

Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum FlowAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Behrooz Azarkhalili

Maxwell Libbrecht

235

14 Feb 2025

Breaking Down Bias: On The Limits of Generalizable Pruning StrategiesConference on Fairness, Accountability and Transparency (FAccT), 2025

207

11 Feb 2025

Learning Task Representations from In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

288

08 Feb 2025

Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

1.3K

04 Feb 2025

Emergent Stack Representations in Modeling Counter Languages Using Transformers

Utkarsh Tiwari

Aviral Gupta

Michael Hahn

917

03 Feb 2025

HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs

128

02 Feb 2025

Ehrenfeucht-Haussler Rank and Chain of Thought

428

22 Jan 2025

Merging Feed-Forward Sublayers for Compressed Transformers

377

10 Jan 2025

CURing Large Models: Compression via CUR Decomposition

Sanghyeon Park

Soo-Mook Moon

352

08 Jan 2025

Visual Large Language Models for Generalized and Specialized Applications

461

06 Jan 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis ApproachComputer Vision and Pattern Recognition (CVPR), 2024

273

24 Dec 2024

ImagePiece: Content-aware Re-tokenization for Efficient Image RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2024

219

21 Dec 2024

Rethinking Model Redundancy for Low-light Image Enhancement

348

21 Dec 2024

Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali

309

18 Dec 2024

Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models

275

15 Dec 2024

A Decade of Deep Learning: A Survey on The Magnificent Seven

Dilshod Azizov

Muhammad Arslan Manzoor

...

300

13 Dec 2024

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

...

426

03 Dec 2024

MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache

526

27 Nov 2024

LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer AttributionsComputer Vision and Pattern Recognition (CVPR), 2024

Faridoun Mehri

Mahdieh Soleymani Baghshah

Mohammad Taher Pilehvar

296

24 Nov 2024

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024

1.1K

22 Nov 2024

JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit

435

17 Nov 2024

An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2

Pepijn de Reus

Ana Oprescu

Jelle Zuidema

313

15 Nov 2024

Enhancing Brain Tumor Classification Using TrAdaBoost and Multi-Classifier Deep Learning Approaches

Mahin Mohammadi

Saman Jamshidi

247

31 Oct 2024

ResiDual Transformer Alignment with Spectral Decomposition

559

31 Oct 2024

Abrupt Learning in Transformers: A Case Study on Matrix CompletionNeural Information Processing Systems (NeurIPS), 2024

Pulkit Gopalani

Ekdeep Singh Lubana

Wei Hu

183

29 Oct 2024

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and ReasoningInternational Conference on Learning Representations (ICLR), 2024

397

25 Oct 2024

Large Language Models Are Overparameterized Text EncodersWorkshop on Representation Learning for NLP (RepL4NLP), 2024

Thennal D K

Tim Fischer

Chris Biemann

218

18 Oct 2024

Neuron-based Personality Trait Induction in Large Language Models

243

16 Oct 2024

AERO: Entropy-Guided Framework for Private LLM Inference

N. Jha

Brandon Reagen

492

16 Oct 2024

Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment

Liangwei Nathan Zheng

987

16 Oct 2024

MoH: Multi-Head Attention as Mixture-of-Head AttentionInternational Conference on Machine Learning (ICML), 2024

413

15 Oct 2024

Token Pruning using a Lightweight Background Aware Vision Transformer

279

12 Oct 2024

Robust AI-Generated Text Detection by Restricted EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Serguei Barannikov

181

10 Oct 2024

Mechanistic?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024

Naomi Saphra

Sarah Wiegreffe

AI4CE

260

07 Oct 2024

Explanation sensitivity to the randomness of large language models: the case of journalistic text classification

Jérémie Bogaert

Marie-Catherine de Marneffe

Antonin Descampe

Louis Escouflaire

Cedrick Fairon

François-Xavier Standaert

343

07 Oct 2024

Activation Scaling for Steering and Interpreting Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Niklas Stoehr

Kevin Du

Vésteinn Snæbjarnarson

282

07 Oct 2024

Differentiation and Specialization of Attention Heads via the Refined Local Learning CoefficientInternational Conference on Learning Representations (ICLR), 2024

223

03 Oct 2024

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

Mohammad Hossein Sekhavat

Moin Nabi

Mehrdad Farajtabar

MoE

279

01 Oct 2024

Softmax is not Enough (for Sharp Size Generalisation)

Petar Velickovic

Christos Perivolaropoulos

Federico Barbero

Razvan Pascanu

413

01 Oct 2024

Enhancing elusive clues in knowledge learning by contrasting attention of language modelsAAAI Conference on Artificial Intelligence (AAAI), 2024

Jian Gao

Xiao Zhang

Ji Wu

Chenyi Guo

344

26 Sep 2024

Explanation Bottleneck ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

Shinýa Yamaguchi

Kosuke Nishida

LRM BDL

371

26 Sep 2024