Does Transformer Interpretability Transfer to RNNs?

9 April 2024

Papers citing "Does Transformer Interpretability Transfer to RNNs?"

6 / 6 papers shown

Title
Looking Beyond The Top-1: Transformers Determine Top Tokens In Order Daria Lioubashevski Tomer Schlank Gabriel Stanovsky Ariel Goldstein 29 1 0 26 Oct 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability Aaron Mueller Jannik Brinkmann Millicent Li Samuel Marks Koyena Pal ... Arnab Sen Sharma Jiuding Sun Eric Todd David Bau Yonatan Belinkov CML 40 18 0 02 Aug 2024
MambaLRP: Explaining Selective State Space Sequence Models F. Jafari G. Montavon Klaus-Robert Müller Oliver Eberle Mamba 47 9 0 11 Jun 2024
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks Max Tegmark HILM 91 164 0 10 Oct 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 245 1,977 0 31 Dec 2020