Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.17806
Cited By
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
26 March 2024
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms"
11 / 11 papers shown
Title
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Tyler A. Chang
Benjamin Bergen
38
0
0
21 Apr 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
41
0
0
14 Mar 2025
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
L. Zhang
Lijie Hu
Di Wang
LRM
79
0
0
17 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoE
AI4CE
52
0
0
13 Feb 2025
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
47
18
0
02 Jul 2024
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar
Alexander Wettig
Dan Friedman
Danqi Chen
41
14
0
24 Jun 2024
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
39
19
0
28 May 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to components
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
KELM
31
40
0
01 Mar 2024
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
170
116
0
30 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
205
486
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
232
453
0
24 Sep 2022
1