ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.17727
  4. Cited By
Sparse Autoencoders Can Interpret Randomly Initialized Transformers

Sparse Autoencoders Can Interpret Randomly Initialized Transformers

29 January 2025
Thomas Heap
Tim Lawson
Lucy Farnik
Laurence Aitchison
ArXiv (abs)PDFHTML

Papers citing "Sparse Autoencoders Can Interpret Randomly Initialized Transformers"

15 / 15 papers shown
Title
Rethinking Explainability in the Era of Multimodal AI
Rethinking Explainability in the Era of Multimodal AI
Chirag Agarwal
22
0
0
16 Jun 2025
Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures
Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures
Mark Muchane
Sean Richardson
Kiho Park
Victor Veitch
36
0
0
01 Jun 2025
Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning
Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning
Stepan Shabalin
Ayush Panda
Dmitrii Kharlapenko
Abdur Raheem Ali
Yixiong Hao
Arthur Conmy
DiffM
47
0
0
30 May 2025
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Vadim Kurochkin
Yaroslav Aksenov
Daniil Laptev
Daniil Gavrilov
Nikita Balagansky
60
0
0
28 May 2025
Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
Patrick Leask
Neel Nanda
Noura Al Moubayed
87
1
0
23 May 2025
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models
Woody Haosheng Gan
Deqing Fu
Julian Asilis
Ollie Liu
Dani Yogatama
Vatsal Sharan
Robin Jia
Willie Neiswanger
LLMSV
87
1
0
20 May 2025
Explaining Neural Networks with Reasons
Explaining Neural Networks with Reasons
Levin Hornischer
Hannes Leitgeb
FAttAAMLMILM
95
0
0
20 May 2025
SplInterp: Improving our Understanding and Training of Sparse Autoencoders
SplInterp: Improving our Understanding and Training of Sparse Autoencoders
Jeremy Budd
Javier Ideami
Benjamin Macdowall Rynne
Keith Duggar
Randall Balestriero
109
0
0
17 May 2025
Probing the Vulnerability of Large Language Models to Polysemantic Interventions
Probing the Vulnerability of Large Language Models to Polysemantic Interventions
Bofan Gong
Shiyang Lai
Dawn Song
AAMLMILM
61
1
0
16 May 2025
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Rui Melo
Claudia Mamede
Andre Catarino
Rui Abreu
Henrique Lopes Cardoso
130
0
0
15 May 2025
Disentangling Polysemantic Channels in Convolutional Neural Networks
Disentangling Polysemantic Channels in Convolutional Neural Networks
Robin Hesse
Jonas Fischer
Simone Schaub-Meyer
Stefan Roth
FAttMILM
108
0
0
17 Apr 2025
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
Adam Karvonen
Can Rager
Johnny Lin
Curt Tigges
Joseph Isaac Bloom
...
Matthew Wearden
Arthur Conmy
Arthur Conmy
Samuel Marks
Neel Nanda
MU
164
23
0
12 Mar 2025
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
Tim Lawson
Conor Houghton
Laurence Aitchison
107
1
0
25 Feb 2025
FADE: Why Bad Descriptions Happen to Good Features
FADE: Why Bad Descriptions Happen to Good Features
Bruno Puri
Aakriti Jain
Elena Golimblevskaia
Patrick Kahardipraja
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
270
1
0
24 Feb 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
141
17
0
23 Feb 2025
1