SplInterp: Improving our Understanding and Training of Sparse Autoencoders

17 May 2025

Papers citing "SplInterp: Improving our Understanding and Training of Sparse Autoencoders"

2 / 2 papers shown

Title
Sparse Autoencoders Do Not Find Canonical Units of Analysis Patrick Leask Bart Bussmann Michael T. Pearce Joseph Isaac Bloom Curt Tigges Noura Al Moubayed Lee D. Sharkey Neel Nanda 121 15 0 07 Feb 2025
Sparse Autoencoders Can Interpret Randomly Initialized Transformers Thomas Heap Tim Lawson Lucy Farnik Laurence Aitchison 73 17 0 29 Jan 2025