Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.10927
Cited By
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
17 May 2024
Lucius Bushnaq
Jake Mendel
Stefan Heimersheim
Dan Braun
Nicholas Goldowsky-Dill
Kaarel Hänni
Cindy Wu
Marius Hobbhahn
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Using Degeneracy in the Loss Landscape for Mechanistic Interpretability"
3 / 3 papers shown
Title
Review and Prospect of Algebraic Research in Equivalent Framework between Statistical Mechanics and Machine Learning Theory
Sumio Watanabe
25
1
0
31 May 2024
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq
Stefan Heimersheim
Nicholas Goldowsky-Dill
Dan Braun
Jake Mendel
Kaarel Hänni
Avery Griffin
Jörn Stöhler
Magdalena Wache
Marius Hobbhahn
FAtt
28
3
0
17 May 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
491
0
01 Nov 2022
1