Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.12892
Cited By
v1
v2 (latest)
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
18 February 2025
Thomas Fel
Ekdeep Singh Lubana
Jacob S. Prince
M. Kowal
Victor Boutin
Isabel Papadimitriou
Binxu Wang
Martin Wattenberg
Demba Ba
Talia Konkle
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models"
9 / 9 papers shown
Title
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa
Thomas Fel
Ekdeep Singh Lubana
Bahareh Tolooshams
Demba Ba
49
0
0
03 Jun 2025
BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
Lindia Tjuatja
Graham Neubig
28
0
0
02 Jun 2025
Ensembling Sparse Autoencoders
Soham Gadgil
Chris Lin
Su-In Lee
79
0
0
21 May 2025
Interpreting the linear structure of vision-language model embedding spaces
Isabel Papadimitriou
Huangyuan Su
Thomas Fel
Naomi Saphra
Sham Kakade
VLM
120
1
0
16 Apr 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
105
10
0
03 Mar 2025
Archetypal Analysis for Binary Data
Anna Emilie J. Wedenborg
Morten Mørup
155
2
0
06 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan
Julian Forsyth
Thomas Fel
M. Kowal
Konstantinos G. Derpanis
146
10
0
06 Feb 2025
Towards Unifying Interpretability and Control: Evaluation via Intervention
Usha Bhalla
Suraj Srinivas
Asma Ghandeharioun
Himabindu Lakkaraju
108
11
0
07 Nov 2024
Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon
Manish Shrivastava
David M. Krueger
Ekdeep Singh Lubana
109
8
0
15 Oct 2024
1