Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

3 March 2025

Papers citing "Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry"

4 / 4 papers shown

Title
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i Kola Ayonrinde Louis Jaburi MILM 63 1 0 01 May 2025
Representation Learning on a Random Lattice Aryeh Brill OOD FAtt AI4CE 63 0 0 28 Apr 2025
Interpreting the Linear Structure of Vision-language Model Embedding Spaces Isabel Papadimitriou Huangyuan Su Thomas Fel Naomi Saphra Sham Kakade Stephanie Gil VLM 37 0 0 16 Apr 2025
Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models Michael Lan Philip H. S. Torr Austin Meek Ashkan Khakzar David M. Krueger Fazl Barez 28 9 0 09 Oct 2024