Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.09418
Cited By
v1
v2 (latest)
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
23 May 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned"
50 / 741 papers shown
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Computer Vision and Pattern Recognition (CVPR), 2025
Itay Benou
Tammy Riklin-Raviv
574
6
0
27 Feb 2025
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Kohei Tsuji
Tatsuya Hiraoka
Yuchang Cheng
Eiji Aramaki
Tomoya Iwakura
316
0
0
27 Feb 2025
Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Angelica I Aviles-Rivero
Chuanlong Xie
Yao Zhu
540
4
0
26 Feb 2025
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts
Rabindra Lamsal
M. Read
S. Karunasekera
Muhammad Imran
228
0
0
24 Feb 2025
LESA: Learnable LLM Layer Scaling-Up
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yifei Yang
Zouying Cao
Xinbei Ma
Yao Yao
L. Qin
Zhongfu Chen
Hai Zhao
396
3
0
20 Feb 2025
EvoP: Robust LLM Inference via Evolutionary Pruning
Shangyu Wu
Hongchao Du
Ying Xiong
Shuai Chen
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
640
3
0
19 Feb 2025
LLMs as a synthesis between symbolic and distributed approaches to language
Gemma Boleda
SyDa
307
0
0
17 Feb 2025
Exploring the Translation Mechanism of Large Language Models
Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang
416
2
0
17 Feb 2025
AI Generations: From AI 1.0 to AI 4.0
Jiahao Wu
Hengxu You
Jing Du
AI4TS
219
4
0
16 Feb 2025
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Behrooz Azarkhalili
Maxwell Libbrecht
235
0
0
14 Feb 2025
Breaking Down Bias: On The Limits of Generalizable Pruning Strategies
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Sibo Ma
Alejandro Salinas
Peter Henderson
Julian Nyarko
207
2
0
11 Feb 2025
Learning Task Representations from In-Context Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Baturay Saglam
Zhuoran Yang
Zhuoran Yang
Dionysis Kalogerias
Amin Karbasi
288
7
0
08 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
1.3K
13
0
04 Feb 2025
Emergent Stack Representations in Modeling Counter Languages Using Transformers
Utkarsh Tiwari
Aviral Gupta
Michael Hahn
917
1
0
03 Feb 2025
HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs
Mehdi Makni
Kayhan Behdin
Zheng Xu
Natalia Ponomareva
Rahul Mazumder
128
1
0
02 Feb 2025
Ehrenfeucht-Haussler Rank and Chain of Thought
Pablo Barceló
Chris Köcher
Tomasz Steifer
LRM
428
2
0
22 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
377
0
0
10 Jan 2025
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
352
2
0
08 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
461
33
0
06 Jan 2025
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Computer Vision and Pattern Recognition (CVPR), 2024
Jing Bi
Junjia Guo
Yunlong Tang
Lianggong Wen
Zhang Liu
Chenliang Xu
273
6
0
24 Dec 2024
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition
AAAI Conference on Artificial Intelligence (AAAI), 2024
Seungdong Yoa
Seungjun Lee
Hyeseung Cho
Bumsoo Kim
Woohyung Lim
ViT
219
1
0
21 Dec 2024
Rethinking Model Redundancy for Low-light Image Enhancement
Tong Li
Lizhi Wang
Hansen Feng
Lin Zhu
Wanxuan Lu
Hua Huang
348
0
0
21 Dec 2024
Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali
Sharad Duwal
Suraj Prasai
Suresh Manandhar
CLL
309
3
0
18 Dec 2024
Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
Paweł Mąka
Yusuf Can Semerci
Jan Scholtes
Gerasimos Spanakis
275
1
0
15 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Liang Luo
...
Liang Li
Houcheng Su
Yu Zhong
Wei Liu
Shangsong Liang
OOD
AI4TS
MedIm
300
0
0
13 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
Shijie Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
426
51
0
03 Dec 2024
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Akshat Sharma
Hangliang Ding
Jianping Li
Neel Dani
Minjia Zhang
526
2
0
27 Nov 2024
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions
Computer Vision and Pattern Recognition (CVPR), 2024
Faridoun Mehri
Mahdieh Soleymani Baghshah
Mohammad Taher Pilehvar
296
3
0
24 Nov 2024
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Junzhe Chen
Tianshu Zhang
Shijie Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLM
VLM
1.1K
11
0
22 Nov 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Peng Kuang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
435
16
0
17 Nov 2024
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2
Pepijn de Reus
Ana Oprescu
Jelle Zuidema
MQ
313
2
0
15 Nov 2024
Enhancing Brain Tumor Classification Using TrAdaBoost and Multi-Classifier Deep Learning Approaches
Mahin Mohammadi
Saman Jamshidi
247
3
0
31 Oct 2024
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
559
4
0
31 Oct 2024
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Neural Information Processing Systems (NeurIPS), 2024
Pulkit Gopalani
Ekdeep Singh Lubana
Wei Hu
183
8
0
29 Oct 2024
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
International Conference on Learning Representations (ICLR), 2024
Yu Fu
Zefan Cai
Abedelkadir Asi
Wayne Xiong
Yue Dong
Wen Xiao
397
54
0
25 Oct 2024
Large Language Models Are Overparameterized Text Encoders
Workshop on Representation Learning for NLP (RepL4NLP), 2024
Thennal D K
Tim Fischer
Chris Biemann
218
4
0
18 Oct 2024
Neuron-based Personality Trait Induction in Large Language Models
Jia Deng
Tianyi Tang
Yanbin Yin
Wenhao Yang
Wayne Xin Zhao
Ji-Rong Wen
243
3
0
16 Oct 2024
AERO: Entropy-Guided Framework for Private LLM Inference
N. Jha
Brandon Reagen
492
5
0
16 Oct 2024
Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment
Liangwei Nathan Zheng
Chang George Dong
Wei Emma Zhang
Lin Yue
Miao Xu
Olaf Maennel
Weitong Chen
AI4TS
987
2
0
16 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
International Conference on Machine Learning (ICML), 2024
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
413
37
0
15 Oct 2024
Token Pruning using a Lightweight Background Aware Vision Transformer
Sudhakar Sah
Ravish Kumar
Honnesh Rohmetra
Ehsan Saboori
ViT
279
2
0
12 Oct 2024
Robust AI-Generated Text Detection by Restricted Embeddings
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kristian Kuznetsov
Eduard Tulchinskii
Laida Kushnareva
German Magai
Serguei Barannikov
Sergey I. Nikolenko
Irina Piontkovskaya
DeLMO
181
15
0
10 Oct 2024
Mechanistic?
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024
Naomi Saphra
Sarah Wiegreffe
AI4CE
260
34
0
07 Oct 2024
Explanation sensitivity to the randomness of large language models: the case of journalistic text classification
Jérémie Bogaert
Marie-Catherine de Marneffe
Antonin Descampe
Louis Escouflaire
Cedrick Fairon
François-Xavier Standaert
343
3
0
07 Oct 2024
Activation Scaling for Steering and Interpreting Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Niklas Stoehr
Kevin Du
Vésteinn Snæbjarnarson
Robert West
Robert Bamler
Aaron Schein
LLMSV
LRM
282
14
0
07 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
International Conference on Learning Representations (ICLR), 2024
George Wang
Jesse Hoogland
Stan van Wingerden
Zach Furman
Daniel Murfet
OffRL
223
23
0
03 Oct 2024
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Keivan Alizadeh
Iman Mirzadeh
Hooman Shahrokhi
Dmitry Belenko
Frank Sun
Minsik Cho
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
MoE
279
2
0
01 Oct 2024
Softmax is not Enough (for Sharp Size Generalisation)
Petar Velickovic
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
413
19
0
01 Oct 2024
Enhancing elusive clues in knowledge learning by contrasting attention of language models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Jian Gao
Xiao Zhang
Ji Wu
Chenyi Guo
344
0
0
26 Sep 2024
Explanation Bottleneck Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Shinýa Yamaguchi
Kosuke Nishida
LRM
BDL
371
4
0
26 Sep 2024
Previous
1
2
3
4
5
6
...
13
14
15
Next
Page 3 of 15
Page
of 15
Go