Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.11767
Cited By
Analyzing (In)Abilities of SAEs via Formal Languages
15 October 2024
Abhinav Menon
Manish Shrivastava
David M. Krueger
Ekdeep Singh Lubana
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analyzing (In)Abilities of SAEs via Formal Languages"
5 / 5 papers shown
Title
How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders
Tatsuro Inaba
Kentaro Inui
Yusuke Miyao
Yohei Oseki
Benjamin Heinzerling
Yu Takagi
53
0
0
09 Mar 2025
Mixture of Experts Made Intrinsically Interpretable
Xingyi Yang
Constantin Venhoff
Ashkan Khakzar
Christian Schroeder de Witt
P. Dokania
Adel Bibi
Philip H. S. Torr
MoE
42
0
0
05 Mar 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
31
4
0
03 Mar 2025
FADE: Why Bad Descriptions Happen to Good Features
Bruno Puri
Aakriti Jain
Elena Golimblevskaia
Patrick Kahardipraja
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
43
0
0
24 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan
Julian Forsyth
Thomas Fel
M. Kowal
Konstantinos G. Derpanis
81
7
0
06 Feb 2025
1