Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

13 June 2024

Papers citing "Talking Heads: Understanding Inter-layer Communication in Transformer Language Models"

17 / 17 papers shown

Title
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models Tyler A. Chang Benjamin Bergen 38 0 0 21 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model Andrew Lee Lihao Sun Chris Wendler Fernanda Viégas Martin Wattenberg LRM 17 0 0 19 Apr 2025
Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond Frank Yingjie Huo Neil F. Johnson 51 1 0 06 Apr 2025
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition Brianna Chrisman Lucius Bushnaq Lee D. Sharkey 39 0 0 31 Mar 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Tianyi Lorena Yan Robin Jia KELM MU 36 0 0 27 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections Da Xiao Qingye Meng Shengping Li Xingyuan Yuan MoE AI4CE 52 0 0 13 Feb 2025
Hymba: A Hybrid-head Architecture for Small Language Models Xin Dong Y. Fu Shizhe Diao Wonmin Byeon Zijia Chen ... Min-Hung Chen Yoshi Suhara Y. Lin Jan Kautz Pavlo Molchanov Mamba 80 13 0 20 Nov 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability Aaron Mueller Jannik Brinkmann Millicent Li Samuel Marks Koyena Pal ... Arnab Sen Sharma Jiuding Sun Eric Todd David Bau Yonatan Belinkov CML 27 18 0 02 Aug 2024
Relational Composition in Neural Networks: A Survey and Call to Action Martin Wattenberg Fernanda Viégas CoGe 28 8 0 19 Jul 2024
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models Ting-Yun Chang Jesse Thomason Robin Jia 25 4 0 19 Jun 2024
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation Aaditya K. Singh Ted Moskovitz Felix Hill Stephanie C. Y. Chan Andrew M. Saxe AI4CE 34 24 0 10 Apr 2024
AI-as-exploration: Navigating intelligence space Dimitri Coelho Mollo 29 1 0 15 Jan 2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 153 170 0 02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model Michael Hanna Ollie Liu Alexandre Variengien LRM 173 116 0 30 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 205 486 0 01 Nov 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 117 314 0 21 Sep 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity Yao Lu Max Bartolo Alastair Moore Sebastian Riedel Pontus Stenetorp AILaw LRM 271 882 0 18 Apr 2021