ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.04878
  4. Cited By
Sparse Autoencoders Do Not Find Canonical Units of Analysis

Sparse Autoencoders Do Not Find Canonical Units of Analysis

7 February 2025
Patrick Leask
Bart Bussmann
Michael T. Pearce
Joseph Isaac Bloom
Curt Tigges
Noura Al Moubayed
Lee D. Sharkey
Neel Nanda
ArXiv (abs)PDFHTML

Papers citing "Sparse Autoencoders Do Not Find Canonical Units of Analysis"

12 / 12 papers shown
Title
Dense SAE Latents Are Features, Not Bugs
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun
Alessandro Stolfo
Joshua Engels
Ben Wu
Senthooran Rajamanoharan
Mrinmaya Sachan
Max Tegmark
54
0
0
18 Jun 2025
Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
Seongwan Park
Taeklim Kim
Youngjoong Ko
18
0
0
28 May 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
Mengru Wang
Ziwen Xu
Shengyu Mao
Shumin Deng
Zhaopeng Tu
Ningyu Zhang
N. Zhang
LLMSV
114
0
0
23 May 2025
Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
Patrick Leask
Neel Nanda
Noura Al Moubayed
87
1
0
23 May 2025
Ensembling Sparse Autoencoders
Ensembling Sparse Autoencoders
Soham Gadgil
Chris Lin
Su-In Lee
87
0
0
21 May 2025
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Agam Goyal
Vedant Rathi
William Yeh
Yian Wang
Yuen Chen
Hari Sundaram
100
0
0
20 May 2025
SplInterp: Improving our Understanding and Training of Sparse Autoencoders
SplInterp: Improving our Understanding and Training of Sparse Autoencoders
Jeremy Budd
Javier Ideami
Benjamin Macdowall Rynne
Keith Duggar
Randall Balestriero
99
0
0
17 May 2025
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
Kola Ayonrinde
Louis Jaburi
XAI
147
1
0
02 May 2025
Representation Learning on a Random Lattice
Representation Learning on a Random Lattice
Aryeh Brill
OODFAttAI4CE
120
0
0
28 Apr 2025
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Bart Bussmann
Noa Nabeshima
Adam Karvonen
Neel Nanda
126
13
0
21 Mar 2025
How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders
Tatsuro Inaba
Kentaro Inui
Yusuke Miyao
Yohei Oseki
Benjamin Heinzerling
Yu Takagi
104
1
0
09 Mar 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
105
10
0
03 Mar 2025
1