Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.04625
Cited By
Copy Suppression: Comprehensively Understanding an Attention Head
6 October 2023
Callum McDougall
Arthur Conmy
Cody Rushing
Thomas McGrath
Neel Nanda
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Copy Suppression: Comprehensively Understanding an Attention Head"
9 / 9 papers shown
Title
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Zhengfu He
J. Wang
Rui Lin
Xuyang Ge
Wentao Shu
Qiong Tang
J. Zhang
Xipeng Qiu
70
0
0
29 Apr 2025
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
Yifan Zhang
Wenyu Du
Dongming Jin
Jie Fu
Zhi Jin
LRM
46
0
0
27 Feb 2025
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
ZhongXiang Sun
Xiaoxue Zang
Kai Zheng
Yang Song
Jun Xu
Xiao Zhang
Weijie Yu
Yang Song
Han Li
55
7
0
15 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
Tony T. Wang
Miles Wang
Kaivu Hariharan
Nir Shavit
16
2
0
14 Dec 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
189
119
0
30 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
494
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
458
0
24 Sep 2022
1