Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.15255
Cited By
How to use and interpret activation patching
23 April 2024
Stefan Heimersheim
Neel Nanda
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to use and interpret activation patching"
11 / 11 papers shown
Title
Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation
Chiara Manna
Afra Alishahi
Frédéric Blain
Eva Vanmassenhove
22
0
0
13 May 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
44
0
0
04 Mar 2025
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
Hiba Ahsan
Arnab Sen Sharma
Silvio Amir
David Bau
Byron C. Wallace
80
0
0
20 Feb 2025
Exploring Translation Mechanism of Large Language Models
Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang
57
1
0
17 Feb 2025
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando
Oscar Obeso
Senthooran Rajamanoharan
Neel Nanda
75
10
0
21 Nov 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
80
1
0
02 Oct 2024
A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models
Geonhee Kim
Marco Valentino
André Freitas
LRM
AI4CE
28
7
0
16 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
189
119
0
30 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
189
261
0
28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
494
0
01 Nov 2022
1