Spectral Filters, Dark Signals, and Attention Sinks

14 February 2024

Papers citing "Spectral Filters, Dark Signals, and Attention Sinks"

5 / 5 papers shown

Title
Outlier dimensions favor frequent tokens in language models Iuri Macocco Nora Graichen Gemma Boleda Marco Baroni 42 0 0 27 Mar 2025
More Expressive Attention with Negative Weights Ang Lv Ruobing Xie Shuaipeng Li Jiayi Liao X. Sun Zhanhui Kang Di Wang Rui Yan 30 0 0 11 Nov 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 49 18 0 02 Jul 2024
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 234 453 0 24 Sep 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 117 314 0 21 Sep 2022