Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.10928
Cited By
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
17 May 2024
Lucius Bushnaq
Stefan Heimersheim
Nicholas Goldowsky-Dill
Dan Braun
Jake Mendel
Kaarel Hänni
Avery Griffin
Jörn Stöhler
Magdalena Wache
Marius Hobbhahn
FAtt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks"
2 / 2 papers shown
Title
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
117
314
0
21 Sep 2022
1