Neuron-Level Knowledge Attribution in Large Language Models

Neuron-Level Knowledge Attribution in Large Language Models

19 December 2023

Sophia Ananiadou

Papers citing "Neuron-Level Knowledge Attribution in Large Language Models"

10 / 10 papers shown

Title
FloE: On-the-Fly MoE Inference on Memory-constrained GPU Yuxin Zhou Zheng Li J. Zhang Jue Wang Y. Wang Zhongle Xie Ke Chen Lidan Shou MoE 43 0 0 09 May 2025
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu Sophia Ananiadou 82 0 0 17 Nov 2024
Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT Zhengfu He Xuyang Ge Qiong Tang Tianxiang Sun Qinyuan Cheng Xipeng Qiu 32 20 0 19 Feb 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity Andrew Lee Xiaoyan Bai Itamar Pres Martin Wattenberg Jonathan K. Kummerfeld Rada Mihalcea 64 95 0 03 Jan 2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 153 186 0 02 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 189 261 0 28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 491 0 01 Nov 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 120 316 0 21 Sep 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022
Investigating Saturation Effects in Integrated Gradients Vivek Miglani Narine Kokhlikyan B. Alsallakh Miguel Martin Orion Reblitz-Richardson FAtt 11 23 0 23 Oct 2020