Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.09519
Cited By
v1
v2
v3
v4 (latest)
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
13 June 2024
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Talking Heads: Understanding Inter-layer Communication in Transformer Language Models"
50 / 65 papers shown
Start Making Sense(s): A Developmental Probe of Attention Specialization Using Lexical Ambiguity
Pamela D. Rivière
Sean Trott
72
1
0
26 Nov 2025
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
A. Ahmad
Abhinav Joshi
Ashutosh Modi
68
0
0
25 Nov 2025
LLMs Process Lists With General Filter Heads
Arnab Sen Sharma
Giordano Rogers
Natalie Shapira
David Bau
148
0
0
30 Oct 2025
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Lorenzo Basile
Valentino Maiorca
Diego Doimo
Francesco Locatello
Alberto Cazzaniga
113
2
0
24 Oct 2025
Direct Multi-Token Decoding
Xuan Luo
Weizhi Wang
Xifeng Yan
OffRL
96
0
0
13 Oct 2025
Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research
Sean Trott
110
1
0
26 Sep 2025
HARP: Hallucination Detection via Reasoning Subspace Projection
Junjie Hu
Gang Tu
ShengYu Cheng
Jinxin Li
Jinting Wang
Rui Chen
Zhilong Zhou
Dongbo Shan
174
0
0
15 Sep 2025
I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2
Jack Merullo
Arjun Khurana
Jack Merullo
AuLLM
132
0
0
04 Aug 2025
HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs
Zhaolin Cai
Fan Li
Ziwei Zheng
Yanjun Qin
150
1
0
23 Jul 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mengru Wang
Ziwen Xu
Shengyu Mao
Shumin Deng
Zhaopeng Tu
Ningyu Zhang
Ningyu Zhang
LLMSV
443
9
0
23 May 2025
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Tyler A. Chang
Benjamin Bergen
601
1
0
21 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
423
3
0
19 Apr 2025
Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond
Frank Yingjie Huo
Neil F. Johnson
253
3
0
06 Apr 2025
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Brianna Chrisman
Lucius Bushnaq
Lee D. Sharkey
334
3
0
31 Mar 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELM
MU
316
0
0
27 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoE
AI4CE
488
9
0
13 Feb 2025
Hymba: A Hybrid-head Architecture for Small Language Models
International Conference on Learning Representations (ICLR), 2024
Xin Dong
Y. Fu
Shizhe Diao
Wonmin Byeon
Zijia Chen
...
Min-Hung Chen
Yoshi Suhara
Y. Lin
Jan Kautz
Pavlo Molchanov
Mamba
322
50
0
20 Nov 2024
SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values
Chengwei Sun
Jiwei Wei
Yujia Wu
Yiming Shi
Shiyuan He
Zeyu Ma
Ning Xie
Yang Yang
151
4
0
09 Sep 2024
The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis
Computational Linguistics (CL), 2024
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
494
34
0
02 Aug 2024
Relational Composition in Neural Networks: A Survey and Call to Action
Martin Wattenberg
Fernanda Viégas
CoGe
194
17
0
19 Jul 2024
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ting-Yun Chang
Jesse Thomason
Robin Jia
414
6
0
19 Jun 2024
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Klaudia Bałazy
Mohammadreza Banaei
Karl Aberer
Jacek Tabor
339
51
0
27 May 2024
TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation
Chengcheng Feng
Mu He
Qiuyu Tian
Haojie Yin
Xiaofang Zhao
Hongwei Tang
Xingqiang Wei
DiffM
219
4
0
18 May 2024
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Ruizhe Li
Yanjun Gao
KELM
332
13
0
06 May 2024
Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan
Arthur Conmy
Lewis Smith
Tom Lieberum
Vikrant Varma
János Kramár
Rohin Shah
Neel Nanda
RALM
377
130
0
24 Apr 2024
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
International Conference on Machine Learning (ICML), 2024
Aaditya K. Singh
Ted Moskovitz
Felix Hill
Stephanie C. Y. Chan
Andrew M. Saxe
AI4CE
286
54
0
10 Apr 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
555
250
0
28 Mar 2024
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
226
79
0
26 Mar 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
International Conference on Learning Representations (ICLR), 2024
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
501
146
0
12 Mar 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Tianyi Tang
Wenyang Luo
Haoyang Huang
Dongdong Zhang
Xiaolei Wang
Xin Zhao
Furu Wei
Ji-Rong Wen
343
92
0
26 Feb 2024
AI-as-exploration: Navigating intelligence space
Dimitri Coelho Mollo
234
2
0
15 Jan 2024
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
International Conference on Learning Representations (ICLR), 2023
Gautam Reddy
305
91
0
03 Dec 2023
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
International Conference on Learning Representations (ICLR), 2023
Aleksandar Makelov
Georg Lange
Neel Nanda
237
37
0
28 Nov 2023
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
International Conference on Learning Representations (ICLR), 2023
Han Guo
P. Greengard
Eric P. Xing
Yoon Kim
MQ
460
82
0
20 Nov 2023
The Linear Representation Hypothesis and the Geometry of Large Language Models
International Conference on Machine Learning (ICML), 2023
Kiho Park
Yo Joong Choe
Victor Veitch
LLMSV
MILM
461
318
0
07 Nov 2023
How do Language Models Bind Entities in Context?
International Conference on Learning Representations (ICLR), 2023
Jiahai Feng
Jacob Steinhardt
311
64
0
26 Oct 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
International Conference on Learning Representations (ICLR), 2023
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
283
160
0
24 Oct 2023
Understanding Addition in Transformers
Abir Harrasse
Fazl Barez
597
28
0
19 Oct 2023
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
International Conference on Learning Representations (ICLR), 2023
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
317
543
0
17 Oct 2023
Instilling Inductive Biases with Subnetworks
Enyan Zhang
Michael A. Lepori
Ellie Pavlick
AI4CE
261
5
0
17 Oct 2023
Circuit Component Reuse Across Tasks in Transformer Language Models
International Conference on Learning Representations (ICLR), 2023
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
368
98
0
12 Oct 2023
Low-Resource Languages Jailbreak GPT-4
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
SILM
434
266
0
03 Oct 2023
Sparse Autoencoders Find Highly Interpretable Features in Language Models
International Conference on Learning Representations (ICLR), 2023
Hoagy Cunningham
Aidan Ewart
Logan Riggs
R. Huben
Lee Sharkey
MILM
662
775
0
15 Sep 2023
Large Language Models Are Not Robust Multiple Choice Selectors
International Conference on Learning Representations (ICLR), 2023
Chujie Zheng
Hao Zhou
Fandong Meng
Jie Zhou
Shiyu Huang
487
365
0
07 Sep 2023
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Pouya Pezeshkpour
Estevam R. Hruschka
LRM
259
196
0
22 Aug 2023
Lost in the Middle: How Language Models Use Long Contexts
Transactions of the Association for Computational Linguistics (TACL), 2023
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Abigail Z. Jacobs
RALM
555
2,594
0
06 Jul 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
519
286
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Neural Information Processing Systems (NeurIPS), 2023
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
1.0K
179
0
30 Apr 2023
Localizing Model Behavior with Path Patching
Nicholas W. Goldowsky-Dill
Chris MacLeod
L. Sato
Aryaman Arora
485
122
0
12 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
International Conference on Machine Learning (ICML), 2023
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
384
1,621
0
03 Apr 2023
1
2
Next