v1v2v3v4 (latest)

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

13 June 2024

Jack Merullo

Carsten Eickhoff

Ellie Pavlick

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Talking Heads: Understanding Inter-layer Communication in Transformer Language Models"

50 / 65 papers shown

Start Making Sense(s): A Developmental Probe of Attention Specialization Using Lexical Ambiguity

Pamela D. Rivière

Sean Trott

26 Nov 2025

Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits

A. Ahmad

Abhinav Joshi

Ashutosh Modi

25 Nov 2025

LLMs Process Lists With General Filter Heads

148

30 Oct 2025

Head Pursuit: Probing Attention Specialization in Multimodal Transformers

113

24 Oct 2025

Direct Multi-Token Decoding

13 Oct 2025

Toward a Theory of Generalizability in LLM Mechanistic Interpretability Research

Sean Trott

110

26 Sep 2025

HARP: Hallucination Detection via Reasoning Subspace Projection

174

15 Sep 2025

I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2

132

04 Aug 2025

HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs

150

23 Jul 2025

Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target AtomsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

443

23 May 2025

Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models

Tyler A. Chang

Benjamin Bergen

601

21 Apr 2025

The Geometry of Self-Verification in a Task-Specific Reasoning Model

423

19 Apr 2025

Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond

Frank Yingjie Huo

Neil F. Johnson

253

06 Apr 2025

Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition

Brianna Chrisman

Lucius Bushnaq

Lee D. Sharkey

334

31 Mar 2025

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

Tianyi Lorena Yan

Robin Jia

KELM MU

316

27 Feb 2025

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections

488

13 Feb 2025

Hymba: A Hybrid-head Architecture for Small Language ModelsInternational Conference on Learning Representations (ICLR), 2024

...

322

20 Nov 2024

SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Ning Xie

Yang Yang

151

09 Sep 2024

The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation AnalysisComputational Linguistics (CL), 2024

...

494

02 Aug 2024

Relational Composition in Neural Networks: A Survey and Call to Action

Martin Wattenberg

Fernanda Viégas

CoGe

194

19 Jul 2024

When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Ting-Yun Chang

Jesse Thomason

Robin Jia

414

19 Jun 2024

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

339

27 May 2024

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

219

18 May 2024

Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice QuestionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Ruizhe Li

Yanjun Gao

KELM

332

06 May 2024

Improving Dictionary Learning with Gated Sparse Autoencoders

Senthooran Rajamanoharan

377

130

24 Apr 2024

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formationInternational Conference on Machine Learning (ICML), 2024

286

10 Apr 2024

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

555

250

28 Mar 2024

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models

Xiaogeng Liu

226

26 Mar 2024

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model CompressionInternational Conference on Learning Representations (ICLR), 2024

Xin Wang

Yu Zheng

Zhongwei Wan

Mi Zhang

501

146

12 Mar 2024

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

343

26 Feb 2024

AI-as-exploration: Navigating intelligence space

Dimitri Coelho Mollo

234

15 Jan 2024

The mechanistic basis of data dependence and abrupt learning in an in-context classification taskInternational Conference on Learning Representations (ICLR), 2023

Gautam Reddy

305

03 Dec 2023

Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation PatchingInternational Conference on Learning Representations (ICLR), 2023

Aleksandar Makelov

Georg Lange

Neel Nanda

237

28 Nov 2023

LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model FinetuningInternational Conference on Learning Representations (ICLR), 2023

460

20 Nov 2023

The Linear Representation Hypothesis and the Geometry of Large Language ModelsInternational Conference on Machine Learning (ICML), 2023

461

318

07 Nov 2023

How do Language Models Bind Entities in Context?International Conference on Learning Representations (ICLR), 2023

Jiahai Feng

Jacob Steinhardt

311

26 Oct 2023

What Algorithms can Transformers Learn? A Study in Length GeneralizationInternational Conference on Learning Representations (ICLR), 2023

283

160

24 Oct 2023

Understanding Addition in Transformers

Abir Harrasse

Fazl Barez

597

19 Oct 2023

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formattingInternational Conference on Learning Representations (ICLR), 2023

Melanie Sclar

Yejin Choi

Yulia Tsvetkov

Alane Suhr

317

543

17 Oct 2023

Instilling Inductive Biases with Subnetworks

261

17 Oct 2023

Circuit Component Reuse Across Tasks in Transformer Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Jack Merullo

Carsten Eickhoff

Ellie Pavlick

368

12 Oct 2023

Low-Resource Languages Jailbreak GPT-4

434

266

03 Oct 2023

Sparse Autoencoders Find Highly Interpretable Features in Language ModelsInternational Conference on Learning Representations (ICLR), 2023

662

775

15 Sep 2023

Large Language Models Are Not Robust Multiple Choice SelectorsInternational Conference on Learning Representations (ICLR), 2023

Jie Zhou

487

365

07 Sep 2023

Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions

Pouya Pezeshkpour

Estevam R. Hruschka

LRM

259

196

22 Aug 2023

Lost in the Middle: How Language Models Use Long ContextsTransactions of the Association for Computational Linguistics (TACL), 2023

555

2,594

06 Jul 2023

Finding Neurons in a Haystack: Case Studies with Sparse Probing

519

286

02 May 2023

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language modelNeural Information Processing Systems (NeurIPS), 2023

1.0K

179

30 Apr 2023

Localizing Model Behavior with Path Patching

Nicholas W. Goldowsky-Dill

Chris MacLeod

L. Sato

Aryaman Arora

485

122

12 Apr 2023

Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingInternational Conference on Machine Learning (ICML), 2023

...

384

1,621

03 Apr 2023