Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.12268
Cited By
v1
v2 (latest)
K
K
K
-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks
18 May 2025
Pratim Chowdhary
Peter Chin
Deepernab Chakrabarty
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"$K$-MSHC: Unmasking Minimally Sufficient Head Circuits in Large Language Models with Experiments on Syntactic Classification Tasks"
19 / 19 papers shown
Title
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Thomas Bush
Stephen Chung
Usman Anwar
Adrià Garriga-Alonso
David M. Krueger
LM&Ro
OffRL
74
6
0
02 Apr 2025
Language Models Use Trigonometry to Do Addition
Subhash Kantamneni
Max Tegmark
LRM
83
14
0
02 Feb 2025
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Yaniv Nikankin
Anja Reusch
Aaron Mueller
Yonatan Belinkov
AIFin
LRM
129
33
0
28 Oct 2024
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling
Ruochen Zhang
Qinan Yu
Matianyu Zang
Carsten Eickhoff
Ellie Pavlick
86
6
0
11 Oct 2024
Transcoders Find Interpretable LLM Feature Circuits
Jacob Dunefsky
Philippe Chlenski
Neel Nanda
85
34
0
17 Jun 2024
Scaling and evaluating sparse autoencoders
Leo Gao
Tom Dupré la Tour
Henk Tillman
Gabriel Goh
Rajan Troll
Alec Radford
Ilya Sutskever
Jan Leike
Jeffrey Wu
98
162
0
06 Jun 2024
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
Shreyas Kapur
Vasil Georgiev
Cameron Allen
Scott Emmons
Stuart J. Russell
110
13
0
02 Jun 2024
Automatically Identifying Local and Global Circuits with Linear Computation Graphs
Xuyang Ge
Fukang Zhu
Wentao Shu
Junxuan Wang
Zhengfu He
Xipeng Qiu
93
10
0
22 May 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
171
159
0
28 Mar 2024
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Sohee Yang
E. Gribovskaya
Nora Kassner
Mor Geva
Sebastian Riedel
ReLM
LRM
129
113
0
26 Feb 2024
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
101
54
0
24 May 2023
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
412
4,606
0
27 Oct 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
199
850
0
29 Dec 2020
BLiMP: The Benchmark of Linguistic Minimal Pairs for English
Alex Warstadt
Alicia Parrish
Haokun Liu
Anhad Mohananey
Wei Peng
Sheng-Fu Wang
Samuel R. Bowman
137
496
0
02 Dec 2019
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
269
1,609
0
11 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
120
1,072
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
130
1,151
0
23 May 2019
BERT Rediscovers the Classical NLP Pipeline
Ian Tenney
Dipanjan Das
Ellie Pavlick
MILM
SSeg
155
1,486
0
15 May 2019
Understanding intermediate layers using linear classifier probes
Guillaume Alain
Yoshua Bengio
FAtt
177
957
0
05 Oct 2016
1