Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs

11 February 2024

Papers citing "Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs"

21 / 21 papers shown

Title
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Tianyi Lorena Yan Robin Jia KELM MU 46 0 0 27 Feb 2025
Mechanistic Understanding of Language Models in Syntactic Code Completion Samuel Miller Daking Rai Ziyu Yao LRM 44 0 0 20 Feb 2025
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Yixin Ou Yunzhi Yao N. Zhang Hui Jin Jiacheng Sun Shumin Deng Z. Li H. Chen KELM CLL 49 0 0 16 Feb 2025
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Javier Ferrando Oscar Obeso Senthooran Rajamanoharan Neel Nanda 75 10 0 21 Nov 2024
ResiDual Transformer Alignment with Spectral Decomposition Lorenzo Basile Valentino Maiorca Luca Bortolussi Emanuele Rodolà Francesco Locatello 45 1 0 31 Oct 2024
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization Phillip Guo Aaquib Syed Abhay Sheshadri Aidan Ewart Gintare Karolina Dziugaite KELM MU 31 5 0 16 Oct 2024
Attention Heads of Large Language Models: A Survey Zifan Zheng Yezhaohui Wang Yuxin Huang Shichao Song Mingchuan Yang Bo Tang Feiyu Xiong Zhiyu Li LRM 52 21 0 05 Sep 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 75 18 0 02 Jul 2024
$$\text{Memory}^3$: Language Modeling with Explicit Memory$ $\text{Memory}^3$ : Language Modeling with Explicit Memory Hongkang Yang Zehao Lin Wenjin Wang Hao Wu Zhiyu Li ... Yu Yu Kai Chen Feiyu Xiong Linpeng Tang Weinan E 48 11 0 01 Jul 2024
Interpreting Attention Layer Outputs with Sparse Autoencoders Connor Kissane Robert Krzyzanowski Joseph Isaac Bloom Arthur Conmy Neel Nanda MILM 21 17 0 25 Jun 2024
Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience Martina G. Vilas Federico Adolfi David Poeppel Gemma Roig 35 5 0 03 Jun 2024
Knowledge Circuits in Pretrained Transformers Yunzhi Yao Ningyu Zhang Zekun Xi Meng Wang Ziwen Xu Shumin Deng Huajun Chen KELM 64 19 0 28 May 2024
Knowledge Conflicts for LLMs: A Survey Rongwu Xu Zehan Qi Zhijiang Guo Cunxiang Wang Hongru Wang Yue Zhang Wei Xu 198 91 0 13 Mar 2024
An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l James Dao Yeu-Tong Lau Can Rager Jett Janiak 35 5 0 11 Oct 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 153 186 0 02 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 261 0 28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 491 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 456 0 24 Sep 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 120 316 0 21 Sep 2022
Measuring and Improving Consistency in Pretrained Language Models Yanai Elazar Nora Kassner Shauli Ravfogel Abhilasha Ravichander Eduard H. Hovy Hinrich Schütze Yoav Goldberg HILM 258 343 0 01 Feb 2021
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 406 2,576 0 03 Sep 2019