Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.04878
Cited By
Sparse Autoencoders Do Not Find Canonical Units of Analysis
7 February 2025
Patrick Leask
Bart Bussmann
Michael T. Pearce
Joseph Isaac Bloom
Curt Tigges
Noura Al Moubayed
Lee D. Sharkey
Neel Nanda
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sparse Autoencoders Do Not Find Canonical Units of Analysis"
12 / 12 papers shown
Title
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun
Alessandro Stolfo
Joshua Engels
Ben Wu
Senthooran Rajamanoharan
Mrinmaya Sachan
Max Tegmark
54
0
0
18 Jun 2025
Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
Seongwan Park
Taeklim Kim
Youngjoong Ko
18
0
0
28 May 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
Mengru Wang
Ziwen Xu
Shengyu Mao
Shumin Deng
Zhaopeng Tu
Ningyu Zhang
N. Zhang
LLMSV
114
0
0
23 May 2025
Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
Patrick Leask
Neel Nanda
Noura Al Moubayed
87
1
0
23 May 2025
Ensembling Sparse Autoencoders
Soham Gadgil
Chris Lin
Su-In Lee
87
0
0
21 May 2025
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Agam Goyal
Vedant Rathi
William Yeh
Yian Wang
Yuen Chen
Hari Sundaram
100
0
0
20 May 2025
SplInterp: Improving our Understanding and Training of Sparse Autoencoders
Jeremy Budd
Javier Ideami
Benjamin Macdowall Rynne
Keith Duggar
Randall Balestriero
99
0
0
17 May 2025
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
Kola Ayonrinde
Louis Jaburi
XAI
147
1
0
02 May 2025
Representation Learning on a Random Lattice
Aryeh Brill
OOD
FAtt
AI4CE
120
0
0
28 Apr 2025
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Bart Bussmann
Noa Nabeshima
Adam Karvonen
Neel Nanda
126
13
0
21 Mar 2025
How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders
Tatsuro Inaba
Kentaro Inui
Yusuke Miyao
Yohei Oseki
Benjamin Heinzerling
Yu Takagi
104
1
0
09 Mar 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
105
10
0
03 Mar 2025
1