Copy Suppression: Comprehensively Understanding an Attention Head

Copy Suppression: Comprehensively Understanding an Attention Head

6 October 2023

Callum McDougall

Papers citing "Copy Suppression: Comprehensively Understanding an Attention Head"

9 / 9 papers shown

Title
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition Zhengfu He J. Wang Rui Lin Xuyang Ge Wentao Shu Qiong Tang J. Zhang Xipeng Qiu 70 0 0 29 Apr 2025
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking Yifan Zhang Wenyu Du Dongming Jin Jie Fu Zhi Jin LRM 46 0 0 27 Feb 2025
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability ZhongXiang Sun Xiaoxue Zang Kai Zheng Yang Song Jun Xu Xiao Zhang Weijie Yu Yang Song Han Li 55 7 0 15 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 75 19 0 02 Jul 2024
Forbidden Facts: An Investigation of Competing Objectives in Llama-2 Tony T. Wang Miles Wang Kaivu Hariharan Nir Shavit 16 2 0 14 Dec 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 153 186 0 02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model Michael Hanna Ollie Liu Alexandre Variengien LRM 189 119 0 30 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 494 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 458 0 24 Sep 2022