Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.14670
Cited By
Decomposing The Dark Matter of Sparse Autoencoders
18 October 2024
Joshua Engels
Logan Riggs
Max Tegmark
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Decomposing The Dark Matter of Sparse Autoencoders"
10 / 10 papers shown
Title
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Zhengfu He
J. Wang
Rui Lin
Xuyang Ge
Wentao Shu
Qiong Tang
J. Zhang
Xipeng Qiu
68
0
0
29 Apr 2025
Representation Learning on a Random Lattice
Aryeh Brill
OOD
FAtt
AI4CE
63
0
0
28 Apr 2025
Robustly identifying concepts introduced during chat fine-tuning using crosscoders
Julian Minder
Clement Dumas
Caden Juang
Bilal Chugtai
Neel Nanda
23
0
0
03 Apr 2025
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
Sewoong Lee
Adam Davies
Marc E. Canby
J. Hockenmaier
LLMSV
55
0
0
31 Mar 2025
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Brianna Chrisman
Lucius Bushnaq
Lee D. Sharkey
39
0
0
31 Mar 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
34
4
0
03 Mar 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
51
3
0
23 Feb 2025
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
52
9
0
18 Nov 2024
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel J. Lee
Stefan Heimersheim
AAML
24
4
0
16 Oct 2024
Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Michael Lan
Philip H. S. Torr
Austin Meek
Ashkan Khakzar
David M. Krueger
Fazl Barez
28
9
0
09 Oct 2024
1