Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.10652
Cited By
Toy Models of Superposition
21 September 2022
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
Shauna Kravec
Zac Hatfield-Dodds
R. Lasenby
Dawn Drain
Carol Chen
Roger C. Grosse
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Toy Models of Superposition"
7 / 7 papers shown
Title
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Zhengfu He
J. Wang
Rui Lin
Xuyang Ge
Wentao Shu
Qiong Tang
J. Zhang
Xipeng Qiu
59
0
0
29 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Sonia Joseph
Praneet Suresh
Lorenz Hufe
Edward Stevinson
Robert Graham
Yash Vadi
Danilo Bzdok
Sebastian Lapuschkin
Lee Sharkey
Blake A. Richards
60
34
0
28 Apr 2025
Naturally Computed Scale Invariance in the Residual Stream of ResNet18
André Longon
46
22
0
22 Apr 2025
Towards Combinatorial Interpretability of Neural Computation
Micah Adler
Dan Alistarh
Nir Shavit
FAtt
67
1
0
10 Apr 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
59
0
0
08 Mar 2025
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
39
35
0
30 Sep 2024
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Frederik Pahde
Maximilian Dreyer
Leander Weber
Moritz Weckbecker
Christopher J. Anders
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
32
7
0
07 Feb 2022
1