Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.11756
Cited By
v1
v2 (latest)
Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
16 May 2025
David Chanin
Tomáš Dulka
Adrià Garriga-Alonso
Re-assign community
ArXiv (abs)
PDF
HTML
Github (3★)
Papers citing
"Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders"
7 / 7 papers shown
Title
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
Anton Korznikov
Andrey V. Galichin
Alexey Dontsov
Oleg Y. Rogov
Elena Tutubalina
Ivan Oseledets
104
0
0
26 Sep 2025
Towards Atoms of Large Language Models
Chenhui Hu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
96
0
0
25 Sep 2025
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
David Chanin
Adrià Garriga-Alonso
128
0
0
22 Aug 2025
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Bart Bussmann
Noa Nabeshima
Adam Karvonen
Neel Nanda
257
43
0
21 Mar 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
312
39
0
23 Feb 2025
Decomposing The Dark Matter of Sparse Autoencoders
Joshua Engels
Logan Riggs
Max Tegmark
LLMSV
274
29
0
18 Oct 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
451
233
0
28 Mar 2024
1