InversionView: A General-Purpose Method for Reading Information from Neural Activations

27 May 2024

Papers citing "InversionView: A General-Purpose Method for Reading Information from Neural Activations"

5 / 5 papers shown

Title
Model Lakes Koyena Pal David Bau Renée J. Miller 63 0 0 24 Feb 2025
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model Michael Hanna Ollie Liu Alexandre Variengien LRM 184 116 0 30 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 260 0 28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 221 402 0 24 Feb 2021