Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.19521
Cited By
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
28 March 2024
Ang Lv
Yuhan Chen
Kaiyi Zhang
Yulong Wang
Lifeng Liu
Ji-Rong Wen
Jian Xie
Rui Yan
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models"
13 / 13 papers shown
Title
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
L. Varshney
54
0
0
27 Apr 2025
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
43
1
0
31 Oct 2024
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
55
19
0
28 May 2024
Monotonic Representation of Numeric Properties in Language Models
Benjamin Heinzerling
Kentaro Inui
KELM
MILM
38
9
0
15 Mar 2024
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Chris Wendler
V. Veselovsky
Giovanni Monea
Robert West
56
95
0
16 Feb 2024
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
Quan Tu
Shilong Fan
Zihang Tian
Rui Yan
70
58
0
02 Jan 2024
Characterizing Mechanisms for Factual Recall in Language Models
Qinan Yu
Jack Merullo
Ellie Pavlick
KELM
35
23
0
24 Oct 2023
An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
James Dao
Yeu-Tong Lau
Can Rager
Jett Janiak
27
5
0
11 Oct 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
189
260
0
28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
117
314
0
21 Sep 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1