ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.14435
  4. Cited By
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse
  Autoencoders
v1v2v3 (latest)

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

19 July 2024
Senthooran Rajamanoharan
Tom Lieberum
Nicolas Sonnerat
Arthur Conmy
Vikrant Varma
János Kramár
Neel Nanda
ArXiv (abs)PDFHTMLHuggingFace (7 upvotes)Github (35249★)

Papers citing "Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders"

50 / 128 papers shown
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Rui Melo
Claudia Mamede
Andre Catarino
Rui Abreu
Henrique Lopes Cardoso
506
1
0
10 Apr 2026
AlignSAE: Concept-Aligned Sparse Autoencoders
AlignSAE: Concept-Aligned Sparse Autoencoders
Minglai Yang
Xinyu Guo
Mihai Surdeanu
Liangming Pan
Steven Bethard
Mihai Surdeanu
Liangming Pan
LLMSV
452
2
0
01 Dec 2025
SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models
SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models
Jiaojiao Han
Wujiang Xu
Mingyu Jin
Mengnan Du
LRM
158
2
0
25 Nov 2025
Sparse Autoencoders are Topic Models
Sparse Autoencoders are Topic Models
Leander Girrbach
Zeynep Akata
165
1
0
20 Nov 2025
Weight-sparse transformers have interpretable circuits
Weight-sparse transformers have interpretable circuits
Leo Gao
Achyuta Rajaram
Jacob Coxon
Soham V. Govande
Bowen Baker
Dan Mossing
MILM
327
20
0
17 Nov 2025
Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts
Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts
Xinyuan Yan
Shusen Liu
Kowshik Thopalli
Bei Wang
214
2
0
08 Nov 2025
Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder
Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder
Zhen Xu
Zhen Tan
Song Wang
Kaidi Xu
Tianlong Chen
MoE
336
0
0
07 Nov 2025
Making Interpretable Discoveries from Unstructured Data: A High-Dimensional Multiple Hypothesis Testing Approach
Making Interpretable Discoveries from Unstructured Data: A High-Dimensional Multiple Hypothesis Testing Approach
Jacob Carlson
154
0
0
03 Nov 2025
Finding Manifolds With Bilinear Autoencoders
Finding Manifolds With Bilinear Autoencoders
Thomas Dooms
Ward Gauderis
167
2
0
19 Oct 2025
Time-Aware Feature Selection: Adaptive Temporal Masking for Stable Sparse Autoencoder Training
Time-Aware Feature Selection: Adaptive Temporal Masking for Stable Sparse Autoencoder Training
T. Ed Li
Junyu Ren
94
1
0
09 Oct 2025
Memory Retrieval and Consolidation in Large Language Models through Function Tokens
Memory Retrieval and Consolidation in Large Language Models through Function Tokens
Shaohua Zhang
Yuan Lin
Hang Li
LLMAG
122
0
0
09 Oct 2025
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
Angie Boggust
Donghao Ren
Yannick Assogba
Dominik Moritz
Arvind Satyanarayan
Fred Hohman
192
2
0
07 Oct 2025
Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
Xu Wang
Yan Hu
Benyou Wang
Difan Zou
LLMSV
262
2
0
04 Oct 2025
Interpreting Language Models Through Concept Descriptions: A Survey
Interpreting Language Models Through Concept Descriptions: A Survey
Nils Feldhus
Laura Kopf
MILM
196
3
0
01 Oct 2025
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
Xudong Zhu
Mohammad Mahdi Khalili
Zhihui Zhu
303
0
0
01 Oct 2025
Sparse Autoencoders Make Audio Foundation Models more Explainable
Sparse Autoencoders Make Audio Foundation Models more Explainable
Théo Mariotte
Martin Lebourdais
Antonio Almudévar
Marie Tahon
Alfonso Ortega
Nicolas Dugué
160
1
0
29 Sep 2025
Binary Sparse Coding for Interpretability
Binary Sparse Coding for Interpretability
Lucia Quirke
Stepan Shabalin
Nora Belrose
134
3
0
29 Sep 2025
LLM Interpretability with Identifiable Temporal-Instantaneous Representation
LLM Interpretability with Identifiable Temporal-Instantaneous Representation
Xiangchen Song
Jiaqi Sun
Zijian Li
Yujia Zheng
Kun Zhang
188
2
0
27 Sep 2025
Analysis of Variational Sparse Autoencoders
Analysis of Variational Sparse Autoencoders
Zachary Baker
Yuxiao Li
DRL
373
0
0
26 Sep 2025
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
Anton Korznikov
Andrey V. Galichin
Alexey Dontsov
Oleg Y. Rogov
Elena Tutubalina
Ivan Oseledets
188
3
0
26 Sep 2025
Binary Autoencoder for Mechanistic Interpretability of Large Language Models
Binary Autoencoder for Mechanistic Interpretability of Large Language Models
Hakaze Cho
Haolin Yang
Brian M. Kurkoski
Naoya Inoue
Naoya Inoue
MQ
292
0
0
25 Sep 2025
Towards Atoms of Large Language Models
Towards Atoms of Large Language Models
Chenhui Hu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
149
0
0
25 Sep 2025
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
Mariam Mahran
Katharina Simbeck
386
0
0
24 Sep 2025
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
Katharina Simbeck
Mariam Mahran
MILMLLMSV
304
2
0
22 Sep 2025
Evolution of Concepts in Language Model Pre-Training
Evolution of Concepts in Language Model Pre-Training
Xuyang Ge
Wentao Shu
Jiaxing Wu
Yunhua Zhou
Zhengfu He
Xipeng Qiu
159
4
0
21 Sep 2025
ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models
ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models
Xue Yang
Zhen Wen
Qiqi Jiang
Chenxiao Li
Yuwei Wu
Y. Yang
Yiyao Wang
Xiuqi Huang
Minfeng Zhu
Wei Chen
208
1
0
20 Sep 2025
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
Tomoya Yamashita
Akira Ito
Yuuki Yamanaka
Masanori Yamada
Takayuki Miura
Toshiki Shibahara
MUKELM
157
1
0
19 Sep 2025
The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
Jeremias Lino Ferrao
Matthijs van der Lende
Ilija Lichkovski
Clement Neo
LLMSV
349
1
0
16 Sep 2025
Rethinking Sparse Autoencoders: Select-and-Project for Fairness and Control from Encoder Features Alone
Rethinking Sparse Autoencoders: Select-and-Project for Fairness and Control from Encoder Features Alone
Antonio Bărbălău
Cristian Daniel Păduraru
Teodor Poncu
Alexandru Tifrea
Elena Burceanu
250
1
0
13 Sep 2025
Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework
Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework
Jiaqi Weng
Han Zheng
Hanyu Zhang
Qinqin He
Jialing Tao
Hui Xue
Zhixuan Chu
Xiting Wang
Xiting Wang
ELMLLMSVLRM
180
2
0
11 Sep 2025
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
Deniz Bayazit
Aaron Mueller
Antoine Bosselut
160
1
0
05 Sep 2025
Mechanistic Interpretability with Sparse Autoencoder Neural Operators
Mechanistic Interpretability with Sparse Autoencoder Neural Operators
Bahareh Tolooshams
Ailsa Shen
A. Anandkumar
175
0
0
03 Sep 2025
Understanding sparse autoencoder scaling in the presence of feature manifolds
Understanding sparse autoencoder scaling in the presence of feature manifolds
Eric J. Michaud
Liv Gorton
Tom McGrath
297
2
0
02 Sep 2025
CE-Bench: Towards a Reliable Contrastive Evaluation Benchmark of Interpretability of Sparse Autoencoders
CE-Bench: Towards a Reliable Contrastive Evaluation Benchmark of Interpretability of Sparse Autoencoders
Alex Gulko
Yusen Peng
Sachin Kumar
199
0
0
31 Aug 2025
AdaptiveK Sparse Autoencoders: Dynamic Sparsity Allocation for Interpretable LLM Representations
AdaptiveK Sparse Autoencoders: Dynamic Sparsity Allocation for Interpretable LLM Representations
Yifei Yao
Mengnan Du
220
1
0
24 Aug 2025
Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning
Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning
Junxuan Wang
Xuyang Ge
Wentao Shu
Zhengfu He
Xipeng Qiu
241
0
0
23 Aug 2025
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
David Chanin
Adrià Garriga-Alonso
253
5
0
22 Aug 2025
Evaluating Sparse Autoencoders for Monosemantic Representation
Evaluating Sparse Autoencoders for Monosemantic Representation
Moghis Fereidouni
Muhammad Umair Haider
Peizhong Ju
A.B. Siddique
200
0
0
20 Aug 2025
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
Seonglae Cho
Zekun Wu
Adriano Soares Koshiyama
LLMSV
383
0
0
18 Aug 2025
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
Charles OÑeill
Mudith Jayasekara
Max Kirkby
154
2
0
12 Aug 2025
Interpretable Reward Model via Sparse Autoencoder
Interpretable Reward Model via Sparse Autoencoder
Shuyi Zhang
Wei Shi
Cunchun Li
Jiayi Liao
Tao Liang
Hengxing Cai
744
7
0
12 Aug 2025
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Ziqian Zhong
Aditi Raghunathan
259
5
0
31 Jul 2025
Interpreting CFD Surrogates through Sparse Autoencoders
Interpreting CFD Surrogates through Sparse Autoencoders
Yeping Hu
Shusen Liu
AI4CE
192
1
0
21 Jul 2025
Semantic Convergence: Investigating Shared Representations Across Scaled LLMs
Semantic Convergence: Investigating Shared Representations Across Scaled LLMs
Daniel Son
Sanjana Rathore
Andrew Rufail
Adrian Simon
Daniel Zhang
Soham Dave
Cole Blondin
Kevin Zhu
Sean O'Brien
175
0
0
21 Jul 2025
SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
Boyi Deng
Yu Wan
Baosong Yang
Fei Huang
Wenjie Wang
Fuli Feng
242
1
0
20 Jul 2025
From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease
From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease
Peter William VanHarn Plantinga
Jen-Kai Chen
Roozbeh Sattari
M. R
Denise Klein
199
3
0
16 Jul 2025
SAFER: Probing Safety in Reward Models with Sparse Autoencoder
SAFER: Probing Safety in Reward Models with Sparse Autoencoder
Sihang Li
Wei Shi
Ziyuan Xie
Tao Liang
OffRL
227
2
0
01 Jul 2025
Persona Features Control Emergent Misalignment
Persona Features Control Emergent Misalignment
Miles Wang
Tom Dupré la Tour
Olivia Watkins
Alex Makelov
Ryan A. Chi
...
Jeffrey Wang
Achyuta Rajaram
Johannes Heidecke
Tejal Patwardhan
Dan Mossing
342
38
0
24 Jun 2025
Sparse Feature Coactivation Reveals Causal Semantic Modules in Large Language Models
Sparse Feature Coactivation Reveals Causal Semantic Modules in Large Language Models
Ruixuan Deng
Xiaoyang Hu
Miles Gilberti
Shane Storks
Aman Taxali
Mike Angstadt
Chandra S. Sripada
Joyce Chai
KELMMILMReLMLLMSV
252
0
0
22 Jun 2025
Dense SAE Latents Are Features, Not Bugs
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun
Alessandro Stolfo
Joshua Engels
Ben Wu
Senthooran Rajamanoharan
Mrinmaya Sachan
Max Tegmark
438
7
0
18 Jun 2025
123
Next
Page 1 of 3