Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.09951
Cited By
Optimal ablation for interpretability
16 September 2024
Maximilian Li
Lucas Janson
FAtt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimal ablation for interpretability"
2 / 2 papers shown
Title
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Shichang Zhang
Tessa Han
Usha Bhalla
Hima Lakkaraju
FAtt
141
0
0
17 Feb 2025
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
205
486
0
01 Nov 2022
1