v1v2 (latest)

Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors

Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out (DEELIO), 2021

29 March 2021

ArXiv (abs)PDF HTML Github (42★)

Papers citing "Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors"

50 / 80 papers shown

Are Sparse Autoencoders Useful for Java Function Bug Detection?

Henrique Lopes Cardoso

501

10 Apr 2026

REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

377

25 Nov 2025

Anatomy of an Idiom: Tracing Non-Compositionality in Language Models

Andrew Gomes

219

20 Nov 2025

SCALAR: Benchmarking SAE Interaction Sparsity in Toy LLMs

128

10 Nov 2025

Scaling Non-Parametric Sampling with Representation

156

25 Oct 2025

Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

202

14 Oct 2025

Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours

Rui Melo

Rui Abreu

C. Păsăreanu

180

01 Oct 2025

REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model

166

26 Sep 2025

Analysis of Variational Sparse Autoencoders

Zachary Baker

Yuxiao Li

DRL

370

26 Sep 2025

Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model Diffing

169

23 Sep 2025

Towards Interpretable Deep Neural Networks for Tabular Data

241

10 Sep 2025

ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders

202

26 Aug 2025

Uncovering Emergent Physics Representations Learned In-Context by Large Language Models

127

17 Aug 2025

BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models

121

09 Aug 2025

Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders

Carolina Zheng

Nicolas Beltran-Velez

162

31 Jul 2025

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

245

11 Jul 2025

Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder

474

25 Jun 2025

Stochastic Parameter Decomposition

Lucius Bushnaq

Dan Braun

Lee D. Sharkey

320

25 Jun 2025

Dense SAE Latents Are Features, Not Bugs

Senthooran Rajamanoharan

Mrinmaya Sachan

Max Tegmark

435

18 Jun 2025

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

431

12 Jun 2025

Training Superior Sparse Autoencoders for Instruct Models

Jimmy Chih-Hsien Peng

Min Yang

SyDa

167

09 Jun 2025

Attention-Only Transformers via Unrolled Subspace Denoising

364

04 Jun 2025

Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models

350

22 May 2025

Steering Large Language Models for Machine Translation Personalization

381

22 May 2025

Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations

Yize Zhao

Christos Thrampoulidis

332

13 May 2025

Empirical Evaluation of Progressive Coding for Sparse Autoencoders

Hans Peter

Anders Søgaard

321

30 Apr 2025

Axial-UNet: A Neural Weather Model for Precipitation Nowcasting

Maitreya Sonawane

444

28 Apr 2025

Understanding the Repeat Curse in Large Language Models from a Feature PerspectiveAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

757

19 Apr 2025

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning

641

03 Apr 2025

Capturing Semantic Flow of ML-based Systems

219

13 Mar 2025

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Victor Shea-Jay Huang

679

10 Mar 2025

Do Sparse Autoencoders Generalize? A Case Study of Answerability

560

27 Feb 2025

Steered Generation via Gradient Descent on Sparse Features

Sumanta Bhattacharyya

Pedram Rooshenas

LLMSV

407

25 Feb 2025

Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

387

25 Feb 2025

Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization

386

25 Feb 2025

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

Subhash Kantamneni

Joshua Engels

Senthooran Rajamanoharan

Max Tegmark

Neel Nanda

449

23 Feb 2025

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

450

22 Feb 2025

Interpretable and Testable Vision Features via Sparse Autoencoders

492

10 Feb 2025

Dictionary Learning: The Complexity of Learning Sparse Superposed Features with Feedback

Akash Kumar

1.1K

08 Feb 2025

Out-of-distribution generalization via composition: a lens through induction heads in TransformersProceedings of the National Academy of Sciences of the United States of America (PNAS), 2024

Jiajun Song

Zhuoyan Xu

Yiqiao Zhong

404

31 Dec 2024

A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future DirectionsACM Computing Surveys (ACM CSUR), 2024

516

07 Dec 2024

Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Aashiq Muhamed

Mona Diab

Virginia Smith

279

01 Nov 2024

Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

John Wu

David Wu

Jimeng Sun

563

31 Oct 2024

Focus On This, Not That! Steering LLMs with Adaptive Feature Specification

622

30 Oct 2024

One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models

689

28 Oct 2024

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation EngineeringNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Hongru Wang

369

21 Oct 2024

A Complexity-Based Theory of Compositionality

865

18 Oct 2024

The Geometry of Concepts: Sparse Autoencoder Feature Structure

424

10 Oct 2024

Residual Stream Analysis with Multi-Layer SAEsInternational Conference on Learning Representations (ICLR), 2024

482

06 Sep 2024

Understanding Generative AI Content with Embedding Models

757

19 Aug 2024