Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15949
Cited By
v1
v2 (latest)
Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors
Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out (DEELIO), 2021
29 March 2021
Zeyu Yun
Yubei Chen
Bruno A. Olshausen
Yann LeCun
Re-assign community
ArXiv (abs)
PDF
HTML
Github (42★)
Papers citing
"Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors"
50 / 80 papers shown
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Rui Melo
Claudia Mamede
Andre Catarino
Rui Abreu
Henrique Lopes Cardoso
501
1
0
10 Apr 2026
REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance
Chuyi Kong
Gao Wei
Jing Ma
Hongzhan Lin
Yaxin Fan
KELM
HILM
377
0
0
25 Nov 2025
Anatomy of an Idiom: Tracing Non-Compositionality in Language Models
Andrew Gomes
219
0
0
20 Nov 2025
SCALAR: Benchmarking SAE Interaction Sparsity in Toy LLMs
Sean P. Fillingham
Andrew Gordon
Peter Lai
Xavier Poncini
David Quarel
Stefan Heimersheim
128
0
0
10 Nov 2025
Scaling Non-Parametric Sampling with Representation
Vincent Lu
Aaron Truong
Zeyu Yun
Yubei Chen
DiffM
156
0
0
25 Oct 2025
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Julian Minder
Clement Dumas
Stewart Slocum
Helena Casademunt
Cameron Holmes
Robert West
Neel Nanda
202
6
0
14 Oct 2025
Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours
Rui Melo
Rui Abreu
C. Păsăreanu
180
1
0
01 Oct 2025
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
Bo Li
Guanzhi Deng
Ronghao Chen
Junrong Yue
Shuo Zhang
Qinghua Zhao
Linqi Song
Lijie Wen
LRM
166
1
0
26 Sep 2025
Analysis of Variational Sparse Autoencoders
Zachary Baker
Yuxiao Li
DRL
370
0
0
26 Sep 2025
Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model Diffing
Sabri Boughorbel
Fahim Dalvi
Nadir Durrani
Majd Hawasly
169
1
0
23 Sep 2025
Towards Interpretable Deep Neural Networks for Tabular Data
Khawla Elhadri
Jorg Schlotterer
Christin Seifert
LMTD
XAI
241
1
0
10 Sep 2025
ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders
Xiangyu Liu
Haodi Lei
Yi Liu
Y. Liu
Wei Hu
202
2
0
26 Aug 2025
Uncovering Emergent Physics Representations Learned In-Context by Large Language Models
Yeongwoo Song
Jaeyong Bae
Dong-Kyum Kim
Hawoong Jeong
AI4CE
LRM
127
0
0
17 Aug 2025
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
Jianting Tang
Yubo Wang
Haoyu Cao
Linli Xu
121
2
0
09 Aug 2025
Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders
Carolina Zheng
Nicolas Beltran-Velez
Sweta Karlekar
Claudia Shi
Achille Nazaret
Asif Mallik
Amir Feder
David M. Blei
BDL
162
2
0
31 Jul 2025
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter
Julian Minder
Thomas Hofmann
Tiago Pimentel
245
12
0
11 Jul 2025
Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder
Yingji Zhang
Danilo S. Carvalho
André Freitas
CoGe
474
0
0
25 Jun 2025
Stochastic Parameter Decomposition
Lucius Bushnaq
Dan Braun
Lee D. Sharkey
320
8
0
25 Jun 2025
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun
Alessandro Stolfo
Joshua Engels
Ben Wu
Senthooran Rajamanoharan
Mrinmaya Sachan
Max Tegmark
435
7
0
18 Jun 2025
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Or Shafran
Atticus Geiger
Mor Geva
MILM
431
4
0
12 Jun 2025
Training Superior Sparse Autoencoders for Instruct Models
Jiaming Li
Haoran Ye
Yukun Chen
Xinyue Li
Lei Zhang
Hamid Alinejad-Rokny
Jimmy Chih-Hsien Peng
Min Yang
SyDa
167
1
0
09 Jun 2025
Attention-Only Transformers via Unrolled Subspace Denoising
Peng Wang
Yifu Lu
Yaodong Yu
Druv Pai
Qing Qu
Yi Ma
ViT
364
4
0
04 Jun 2025
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Jiachen Jiang
Jinxin Zhou
Bo Peng
Xia Ning
Zhihui Zhu
350
4
0
22 May 2025
Steering Large Language Models for Machine Translation Personalization
Daniel Scalena
Gabriele Sarti
Arianna Bisazza
Elisabetta Fersini
Malvina Nissim
LLMSV
381
0
0
22 May 2025
Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations
Yize Zhao
Christos Thrampoulidis
332
0
0
13 May 2025
Empirical Evaluation of Progressive Coding for Sparse Autoencoders
Hans Peter
Anders Søgaard
321
0
0
30 Apr 2025
Axial-UNet: A Neural Weather Model for Precipitation Nowcasting
Maitreya Sonawane
Maitreya Sonawane
444
0
0
28 Apr 2025
Understanding the Repeat Curse in Large Language Models from a Feature Perspective
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Junchi Yao
Shu Yang
Jianhua Xu
Lijie Hu
Mengdi Li
Di Wang
757
27
0
19 Apr 2025
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
Julian Minder
Clement Dumas
Caden Juang
Bilal Chugtai
Neel Nanda
641
1
0
03 Apr 2025
Capturing Semantic Flow of ML-based Systems
S. Yoo
R. Feldt
Somin Kim
Naryeong Kim
219
0
0
13 Mar 2025
TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
Victor Shea-Jay Huang
Le Zhuo
Yi Xin
Zhaokai Wang
Peng Gao
Jiaming Song
Renrui Zhang
Shiyang Feng
Hongsheng Li
DiffM
679
9
0
10 Mar 2025
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Lovis Heindrich
Juil Sock
Fazl Barez
Veronika Thost
560
7
0
27 Feb 2025
Steered Generation via Gradient Descent on Sparse Features
Sumanta Bhattacharyya
Pedram Rooshenas
LLMSV
407
0
0
25 Feb 2025
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
Tim Lawson
Conor Houghton
Laurence Aitchison
387
6
0
25 Feb 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization
Grace Guinan
Addison Salvador
Michelle A. Smeaton
Andrew Glaws
Hilary Egan
Brian C. Wyatt
Babak Anasori
K. Fiedler
M. Olszta
Steven Spurgeon
386
9
0
25 Feb 2025
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
449
70
0
23 Feb 2025
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
Hantao Lou
Changye Li
Yalan Qin
Yaodong Yang
450
13
0
22 Feb 2025
Interpretable and Testable Vision Features via Sparse Autoencoders
Samuel Stevens
Wei-Lun Chao
T. Berger-Wolf
Yu-Chuan Su
VLM
492
17
0
10 Feb 2025
Dictionary Learning: The Complexity of Learning Sparse Superposed Features with Feedback
Akash Kumar
1.1K
0
0
08 Feb 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2024
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
404
27
0
31 Dec 2024
A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions
ACM Computing Surveys (ACM CSUR), 2024
Ola Shorinwa
Zhiting Mei
Justin Lidard
Allen Z. Ren
Anirudha Majumdar
HILM
LRM
516
19
0
07 Dec 2024
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Aashiq Muhamed
Mona Diab
Virginia Smith
279
12
0
01 Nov 2024
Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
John Wu
David Wu
Jimeng Sun
563
3
0
31 Oct 2024
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Juil Sock
Francesco Pinto
622
5
0
30 Oct 2024
One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
Viacheslav Surkov
Chris Wendler
Antonio Mari
Mikhail Terekhov
Justin Deschenaux
Robert West
Çağlar Gülçehre
David Bau
VLM
689
14
0
28 Oct 2024
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yu Zhao
Alessio Devoto
Giwon Hong
Xiaotang Du
Aryo Pradipta Gema
Hongru Wang
Xuanli He
Kam-Fai Wong
Pasquale Minervini
KELM
LLMSV
369
53
0
21 Oct 2024
A Complexity-Based Theory of Compositionality
Eric Elmoznino
Thomas Jiralerspong
Yoshua Bengio
Guillaume Lajoie
CoGe
865
16
0
18 Oct 2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Yuxiao Li
Eric J. Michaud
David D. Baek
Joshua Engels
Xiaoqing Sun
Max Tegmark
424
42
0
10 Oct 2024
Residual Stream Analysis with Multi-Layer SAEs
International Conference on Learning Representations (ICLR), 2024
Tim Lawson
Lucy Farnik
Conor Houghton
Laurence Aitchison
482
13
0
06 Sep 2024
Understanding Generative AI Content with Embedding Models
Max Vargas
Reilly Cannon
A. Engel
Anand D. Sarwate
Tony Chiang
757
6
0
19 Aug 2024
1
2
Next
Page 1 of 2