ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.14507
  4. Cited By
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
v1v2v3v4v5 (latest)

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

22 September 2024
David Chanin
James Wilken-Smith
Tomáš Dulka
Hardik Bhatnagar
Joseph Bloom
Joseph Isaac Bloom
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders"

49 / 49 papers shown
Title
Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders
Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders
Samuel Stevens
Jacob Beattie
T. Berger-Wolf
Yu-Chuan Su
48
0
0
21 Nov 2025
Re-envisioning Euclid Galaxy Morphology: Identifying and Interpreting Features with Sparse Autoencoders
Re-envisioning Euclid Galaxy Morphology: Identifying and Interpreting Features with Sparse Autoencoders
John F. Wu
Michael Walmsley
81
0
0
27 Oct 2025
Exploratory Causal Inference in SAEnce
Exploratory Causal Inference in SAEnce
Tommaso Mencattini
Riccardo Cadei
Francesco Locatello
CML
89
0
0
15 Oct 2025
Superposition disentanglement of neural representations reveals hidden alignment
Superposition disentanglement of neural representations reveals hidden alignment
André Longon
David Klindt
Meenakshi Khosla
DRL
246
0
0
03 Oct 2025
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
Xudong Zhu
Mohammad Mahdi Khalili
Zhihui Zhu
196
0
0
01 Oct 2025
Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling
Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling
Federico Tiblias
Irina Bigoulaeva
Jingcheng Niu
Simone Balloccu
Iryna Gurevych
88
0
0
01 Oct 2025
Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours
Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours
Rui Melo
Rui Abreu
C. Păsăreanu
114
0
0
01 Oct 2025
Measuring Sparse Autoencoder Feature Sensitivity
Measuring Sparse Autoencoder Feature Sensitivity
Claire Tian
Katherine Tian
Nathan Hu
148
0
0
28 Sep 2025
LLM Interpretability with Identifiable Temporal-Instantaneous Representation
LLM Interpretability with Identifiable Temporal-Instantaneous Representation
Xiangchen Song
Jiaqi Sun
Zijian Li
Yujia Zheng
Kun Zhang
72
0
0
27 Sep 2025
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
Anton Korznikov
Andrey V. Galichin
Alexey Dontsov
Oleg Y. Rogov
Elena Tutubalina
Ivan Oseledets
100
0
0
26 Sep 2025
ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models
ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models
Xue Yang
Zhen Wen
Qiqi Jiang
Chenxiao Li
Yuwei Wu
Y. Yang
Yiyao Wang
Xiuqi Huang
Minfeng Zhu
Wei Chen
111
0
0
20 Sep 2025
Understanding sparse autoencoder scaling in the presence of feature manifolds
Understanding sparse autoencoder scaling in the presence of feature manifolds
Eric J. Michaud
Liv Gorton
Tom McGrath
168
0
0
02 Sep 2025
Distribution-Aware Feature Selection for SAEs
Distribution-Aware Feature Selection for SAEs
Narmeen Oozeer
Nirmalendu Prakash
Michael Lan
Alice Rigg
Amirali Abdullah
59
0
0
29 Aug 2025
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
David Chanin
Adrià Garriga-Alonso
120
0
0
22 Aug 2025
Disentangling concept semantics via multilingual averaging in Sparse Autoencoders
Disentangling concept semantics via multilingual averaging in Sparse Autoencoders
Cliff O'Reilly
Ernesto Jiménez-Ruiz
Tillman Weyde
84
0
0
19 Aug 2025
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
Charles OÑeill
Mudith Jayasekara
Max Kirkby
77
0
0
12 Aug 2025
Mechanistic Indicators of Understanding in Large Language Models
Mechanistic Indicators of Understanding in Large Language Models
Pierre Beckmann
Matthieu Queloz
151
1
0
07 Jul 2025
Stochastic Parameter Decomposition
Stochastic Parameter Decomposition
Lucius Bushnaq
Dan Braun
Lee D. Sharkey
164
6
0
25 Jun 2025
Dense SAE Latents Are Features, Not Bugs
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun
Alessandro Stolfo
Joshua Engels
Ben Wu
Senthooran Rajamanoharan
Mrinmaya Sachan
Max Tegmark
274
5
0
18 Jun 2025
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Or Shafran
Atticus Geiger
Mor Geva
MILM
305
1
0
12 Jun 2025
Detecting High-Stakes Interactions with Activation Probes
Detecting High-Stakes Interactions with Activation Probes
Alex McKenzie
Urja Pawar
Phil Blandfort
William Bankes
David M. Krueger
Ekdeep Singh Lubana
Dmitrii Krasheninnikov
479
9
0
12 Jun 2025
Transferring Linear Features Across Language Models With Model Stitching
Transferring Linear Features Across Language Models With Model Stitching
Alan Chen
Jack Merullo
Alessandro Stolfo
Ellie Pavlick
172
1
0
07 Jun 2025
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa
Thomas Fel
Ekdeep Singh Lubana
Bahareh Tolooshams
Demba Ba
196
6
0
03 Jun 2025
Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures
Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures
Mark Muchane
Sean Richardson
Kiho Park
Victor Veitch
176
2
0
01 Jun 2025
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Daniil Laptev
Gleb Gerasimov
Yaroslav Aksenov
Daniil Gavrilov
Nikita Balagansky
161
0
0
28 May 2025
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
James Oldfield
Shawn Im
Yixuan Li
M. Nicolaou
Ioannis Patras
Grigorios G. Chrysos
MoE
252
0
0
27 May 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target AtomsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mengru Wang
Ziwen Xu
Shengyu Mao
Shumin Deng
Zhaopeng Tu
Ningyu Zhang
Ningyu Zhang
LLMSV
370
8
0
23 May 2025
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
Haiyan Zhao
Xuansheng Wu
Fan Yang
Bo Shen
Ninghao Liu
Mengnan Du
LLMSV
282
3
0
21 May 2025
Ensembling Sparse Autoencoders
Ensembling Sparse Autoencoders
Soham Gadgil
Chris Lin
Su-In Lee
234
1
0
21 May 2025
Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations
Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations
Aaron Jiaxun Li
Suraj Srinivas
Usha Bhalla
Himabindu Lakkaraju
AAML
275
3
0
21 May 2025
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Agam Goyal
Vedant Rathi
William Yeh
Yian Wang
Yuen Chen
Hari Sundaram
257
1
0
20 May 2025
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
Bofan Gong
Shiyang Lai
James A. Evans
Dawn Song
AAMLMILM
205
1
0
16 May 2025
Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
David Chanin
Tomáš Dulka
Adrià Garriga-Alonso
288
4
0
16 May 2025
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Rui Melo
Claudia Mamede
Andre Catarino
Rui Abreu
Henrique Lopes Cardoso
334
1
0
15 May 2025
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
Unveiling Language-Specific Features in Large Language Models via Sparse AutoencodersAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Boyi Deng
Boyi Deng
Yidan Zhang
Baosong Yang
Fuli Feng
293
2
0
08 May 2025
Representation Learning on a Random Lattice
Representation Learning on a Random Lattice
Aryeh Brill
OODFAttAI4CE
222
0
0
28 Apr 2025
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Andrey V. Galichin
Alexey Dontsov
Polina Druzhinina
Anton Razzhigaev
Oleg Y. Rogov
Elena Tutubalina
Ivan Oseledets
LRM
183
16
0
24 Mar 2025
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Bart Bussmann
Noa Nabeshima
Adam Karvonen
Neel Nanda
257
43
0
21 Mar 2025
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
Adam Karvonen
Can Rager
Johnny Lin
Curt Tigges
Joseph Isaac Bloom
...
Matthew Wearden
Arthur Conmy
Arthur Conmy
Samuel Marks
Neel Nanda
MU
491
50
0
12 Mar 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders
Interpreting CLIP with Hierarchical Sparse Autoencoders
Vladimir Zaigrajew
Hubert Baniecki
P. Biecek
390
10
0
27 Feb 2025
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Lovis Heindrich
Juil Sock
Fazl Barez
Veronika Thost
389
4
0
27 Feb 2025
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
Tim Lawson
Conor Houghton
Laurence Aitchison
249
5
0
25 Feb 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
312
39
0
23 Feb 2025
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
Thomas Fel
Ekdeep Singh Lubana
Jacob S. Prince
M. Kowal
Victor Boutin
Isabel Papadimitriou
Binxu Wang
Martin Wattenberg
Demba Ba
Talia Konkle
199
23
0
18 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan
Julian Forsyth
Thomas Fel
M. Kowal
Konstantinos G. Derpanis
264
22
0
06 Feb 2025
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Blake Bullwinkel
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
339
38
0
18 Nov 2024
An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable
  Radiology Report Generation
An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
Ahmed Abdulaal
Hugo Fry
Nina Montaña-Brown
Ayodeji Ijishakin
Jack Gao
Stephanie L. Hyland
Daniel C. Alexander
Daniel Coelho De Castro
MedIm
255
18
0
04 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
505
77
0
02 Jul 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
451
233
0
28 Mar 2024
1