Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.03855
Cited By
Challenges in Mechanistically Interpreting Model Representations
6 February 2024
Satvik Golechha
James Dao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Challenges in Mechanistically Interpreting Model Representations"
5 / 5 papers shown
Title
Modular Training of Neural Networks aids Interpretability
Satvik Golechha
Maheep Chaudhary
Joan Velja
Alessandro Abate
Nandi Schoots
74
0
0
04 Feb 2025
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
178
116
0
30 Apr 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas F. Icard
Noah D. Goodman
CML
73
98
0
05 Mar 2023
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
237
453
0
24 Sep 2022
1