Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.07325
Cited By
An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
11 October 2023
James Dao
Yeu-Tong Lau
Can Rager
Jett Janiak
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l"
4 / 4 papers shown
Title
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
Can Rager
Arthur Conmy
55
53
0
16 Oct 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
194
116
0
26 Jan 2022
1