ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.09418
  4. Cited By
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
v1v2 (latest)

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
ArXiv (abs)PDFHTML

Papers citing "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"

50 / 741 papers shown
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Itay Benou
Tammy Riklin-Raviv
574
6
0
27 Feb 2025
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Kohei Tsuji
Tatsuya Hiraoka
Yuchang Cheng
Eiji Aramaki
Tomoya Iwakura
316
0
0
27 Feb 2025
Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs
Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Angelica I Aviles-Rivero
Chuanlong Xie
Yao Zhu
540
4
0
26 Feb 2025
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts
Rabindra Lamsal
M. Read
S. Karunasekera
Muhammad Imran
228
0
0
24 Feb 2025
LESA: Learnable LLM Layer Scaling-Up
LESA: Learnable LLM Layer Scaling-UpAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yifei Yang
Zouying Cao
Xinbei Ma
Yao Yao
L. Qin
Zhongfu Chen
Hai Zhao
396
3
0
20 Feb 2025
EvoP: Robust LLM Inference via Evolutionary Pruning
EvoP: Robust LLM Inference via Evolutionary Pruning
Shangyu Wu
Hongchao Du
Ying Xiong
Shuai Chen
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
640
3
0
19 Feb 2025
LLMs as a synthesis between symbolic and distributed approaches to language
LLMs as a synthesis between symbolic and distributed approaches to language
Gemma Boleda
SyDa
307
0
0
17 Feb 2025
Exploring the Translation Mechanism of Large Language Models
Exploring the Translation Mechanism of Large Language Models
Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang
416
2
0
17 Feb 2025
AI Generations: From AI 1.0 to AI 4.0
AI Generations: From AI 1.0 to AI 4.0
Jiahao Wu
Hengxu You
Jing Du
AI4TS
219
4
0
16 Feb 2025
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum FlowAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Behrooz Azarkhalili
Maxwell Libbrecht
235
0
0
14 Feb 2025
Breaking Down Bias: On The Limits of Generalizable Pruning Strategies
Breaking Down Bias: On The Limits of Generalizable Pruning StrategiesConference on Fairness, Accountability and Transparency (FAccT), 2025
Sibo Ma
Alejandro Salinas
Peter Henderson
Julian Nyarko
207
2
0
11 Feb 2025
Learning Task Representations from In-Context Learning
Learning Task Representations from In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Baturay Saglam
Zhuoran Yang
Zhuoran Yang
Dionysis Kalogerias
Amin Karbasi
288
7
0
08 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
1.3K
13
0
04 Feb 2025
Emergent Stack Representations in Modeling Counter Languages Using Transformers
Emergent Stack Representations in Modeling Counter Languages Using Transformers
Utkarsh Tiwari
Aviral Gupta
Michael Hahn
917
1
0
03 Feb 2025
HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs
HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs
Mehdi Makni
Kayhan Behdin
Zheng Xu
Natalia Ponomareva
Rahul Mazumder
128
1
0
02 Feb 2025
Ehrenfeucht-Haussler Rank and Chain of Thought
Ehrenfeucht-Haussler Rank and Chain of Thought
Pablo Barceló
Chris Köcher
Tomasz Steifer
LRM
428
2
0
22 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
377
0
0
10 Jan 2025
CURing Large Models: Compression via CUR Decomposition
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
352
2
0
08 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
461
33
0
06 Jan 2025
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Unveiling Visual Perception in Language Models: An Attention Head Analysis ApproachComputer Vision and Pattern Recognition (CVPR), 2024
Jing Bi
Junjia Guo
Yunlong Tang
Lianggong Wen
Zhang Liu
Chenliang Xu
273
6
0
24 Dec 2024
ImagePiece: Content-aware Re-tokenization for Efficient Image
  Recognition
ImagePiece: Content-aware Re-tokenization for Efficient Image RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2024
Seungdong Yoa
Seungjun Lee
Hyeseung Cho
Bumsoo Kim
Woohyung Lim
ViT
219
1
0
21 Dec 2024
Rethinking Model Redundancy for Low-light Image Enhancement
Rethinking Model Redundancy for Low-light Image Enhancement
Tong Li
Lizhi Wang
Hansen Feng
Lin Zhu
Wanxuan Lu
Hua Huang
348
0
0
21 Dec 2024
Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation
  on Nepali
Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali
Sharad Duwal
Suraj Prasai
Suresh Manandhar
CLL
309
3
0
18 Dec 2024
Analyzing the Attention Heads for Pronoun Disambiguation in
  Context-aware Machine Translation Models
Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
Paweł Mąka
Yusuf Can Semerci
Jan Scholtes
Gerasimos Spanakis
275
1
0
15 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Liang Luo
...
Liang Li
Houcheng Su
Yu Zhong
Wei Liu
Shangsong Liang
OODAI4TSMedIm
300
0
0
13 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A
  Comprehensive Survey
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
Shijie Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
426
51
0
03 Dec 2024
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Akshat Sharma
Hangliang Ding
Jianping Li
Neel Dani
Minjia Zhang
526
2
0
27 Nov 2024
LibraGrad: Balancing Gradient Flow for Universally Better Vision
  Transformer Attributions
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer AttributionsComputer Vision and Pattern Recognition (CVPR), 2024
Faridoun Mehri
Mahdieh Soleymani Baghshah
Mohammad Taher Pilehvar
296
3
0
24 Nov 2024
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object
  Hallucination in Large Vision-Language Models
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Junzhe Chen
Tianshu Zhang
Shijie Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLMVLM
1.1K
11
0
22 Nov 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Peng Kuang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
435
16
0
17 Nov 2024
An exploration of the effect of quantisation on energy consumption and
  inference time of StarCoder2
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2
Pepijn de Reus
Ana Oprescu
Jelle Zuidema
MQ
313
2
0
15 Nov 2024
Enhancing Brain Tumor Classification Using TrAdaBoost and
  Multi-Classifier Deep Learning Approaches
Enhancing Brain Tumor Classification Using TrAdaBoost and Multi-Classifier Deep Learning Approaches
Mahin Mohammadi
Saman Jamshidi
247
3
0
31 Oct 2024
ResiDual Transformer Alignment with Spectral Decomposition
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
559
4
0
31 Oct 2024
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Abrupt Learning in Transformers: A Case Study on Matrix CompletionNeural Information Processing Systems (NeurIPS), 2024
Pulkit Gopalani
Ekdeep Singh Lubana
Wei Hu
183
8
0
29 Oct 2024
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and ReasoningInternational Conference on Learning Representations (ICLR), 2024
Yu Fu
Zefan Cai
Abedelkadir Asi
Wayne Xiong
Yue Dong
Wen Xiao
397
54
0
25 Oct 2024
Large Language Models Are Overparameterized Text Encoders
Large Language Models Are Overparameterized Text EncodersWorkshop on Representation Learning for NLP (RepL4NLP), 2024
Thennal D K
Tim Fischer
Chris Biemann
218
4
0
18 Oct 2024
Neuron-based Personality Trait Induction in Large Language Models
Neuron-based Personality Trait Induction in Large Language Models
Jia Deng
Tianyi Tang
Yanbin Yin
Wenhao Yang
Wayne Xin Zhao
Ji-Rong Wen
243
3
0
16 Oct 2024
AERO: Entropy-Guided Framework for Private LLM Inference
AERO: Entropy-Guided Framework for Private LLM Inference
N. Jha
Brandon Reagen
492
5
0
16 Oct 2024
Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment
Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment
Liangwei Nathan Zheng
Chang George Dong
Wei Emma Zhang
Lin Yue
Miao Xu
Olaf Maennel
Weitong Chen
AI4TS
987
2
0
16 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head AttentionInternational Conference on Machine Learning (ICML), 2024
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
413
37
0
15 Oct 2024
Token Pruning using a Lightweight Background Aware Vision Transformer
Token Pruning using a Lightweight Background Aware Vision Transformer
Sudhakar Sah
Ravish Kumar
Honnesh Rohmetra
Ehsan Saboori
ViT
279
2
0
12 Oct 2024
Robust AI-Generated Text Detection by Restricted Embeddings
Robust AI-Generated Text Detection by Restricted EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kristian Kuznetsov
Eduard Tulchinskii
Laida Kushnareva
German Magai
Serguei Barannikov
Sergey I. Nikolenko
Irina Piontkovskaya
DeLMO
181
15
0
10 Oct 2024
Mechanistic?
Mechanistic?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024
Naomi Saphra
Sarah Wiegreffe
AI4CE
260
34
0
07 Oct 2024
Explanation sensitivity to the randomness of large language models: the
  case of journalistic text classification
Explanation sensitivity to the randomness of large language models: the case of journalistic text classification
Jérémie Bogaert
Marie-Catherine de Marneffe
Antonin Descampe
Louis Escouflaire
Cedrick Fairon
François-Xavier Standaert
343
3
0
07 Oct 2024
Activation Scaling for Steering and Interpreting Language Models
Activation Scaling for Steering and Interpreting Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Niklas Stoehr
Kevin Du
Vésteinn Snæbjarnarson
Robert West
Robert Bamler
Aaron Schein
LLMSVLRM
282
14
0
07 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined
  Local Learning Coefficient
Differentiation and Specialization of Attention Heads via the Refined Local Learning CoefficientInternational Conference on Learning Representations (ICLR), 2024
George Wang
Jesse Hoogland
Stan van Wingerden
Zach Furman
Daniel Murfet
OffRL
223
23
0
03 Oct 2024
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language
  Models
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Keivan Alizadeh
Iman Mirzadeh
Hooman Shahrokhi
Dmitry Belenko
Frank Sun
Minsik Cho
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
MoE
279
2
0
01 Oct 2024
Softmax is not Enough (for Sharp Size Generalisation)
Softmax is not Enough (for Sharp Size Generalisation)
Petar Velickovic
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
413
19
0
01 Oct 2024
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Enhancing elusive clues in knowledge learning by contrasting attention of language modelsAAAI Conference on Artificial Intelligence (AAAI), 2024
Jian Gao
Xiao Zhang
Ji Wu
Chenyi Guo
344
0
0
26 Sep 2024
Explanation Bottleneck Models
Explanation Bottleneck ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024
Shinýa Yamaguchi
Kosuke Nishida
LRMBDL
371
4
0
26 Sep 2024
Previous
123456...131415
Next
Page 3 of 15
Pageof 15