Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.10827
Cited By
LLM Circuit Analyses Are Consistent Across Training and Scale
15 July 2024
Curt Tigges
Michael Hanna
Qinan Yu
Stella Biderman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLM Circuit Analyses Are Consistent Across Training and Scale"
16 / 16 papers shown
Title
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Tyler A. Chang
Benjamin Bergen
38
0
0
21 Apr 2025
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
82
0
0
02 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
57
18
0
02 Jul 2024
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Aaditya K. Singh
Ted Moskovitz
Felix Hill
Stephanie C. Y. Chan
Andrew M. Saxe
AI4CE
42
24
0
10 Apr 2024
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
51
33
0
26 Mar 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to components
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
KELM
41
42
0
01 Mar 2024
Information Flow Routes: Automatically Interpreting Language Models at Scale
Javier Ferrando
Elena Voita
40
10
0
27 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
124
349
0
01 Feb 2024
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
Can Rager
Arthur Conmy
55
53
0
16 Oct 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
178
116
0
30 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Word Acquisition in Neural Language Models
Tyler A. Chang
Benjamin Bergen
27
29
0
05 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
221
291
0
24 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1