ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.10652
  4. Cited By
Toy Models of Superposition

Toy Models of Superposition

21 September 2022
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
Shauna Kravec
Zac Hatfield-Dodds
R. Lasenby
Dawn Drain
Carol Chen
Roger C. Grosse
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
    AAML
    MILM
ArXivPDFHTML

Papers citing "Toy Models of Superposition"

7 / 7 papers shown
Title
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Zhengfu He
J. Wang
Rui Lin
Xuyang Ge
Wentao Shu
Qiong Tang
J. Zhang
Xipeng Qiu
59
0
0
29 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Sonia Joseph
Praneet Suresh
Lorenz Hufe
Edward Stevinson
Robert Graham
Yash Vadi
Danilo Bzdok
Sebastian Lapuschkin
Lee Sharkey
Blake A. Richards
60
34
0
28 Apr 2025
Naturally Computed Scale Invariance in the Residual Stream of ResNet18
Naturally Computed Scale Invariance in the Residual Stream of ResNet18
André Longon
46
22
0
22 Apr 2025
Towards Combinatorial Interpretability of Neural Computation
Towards Combinatorial Interpretability of Neural Computation
Micah Adler
Dan Alistarh
Nir Shavit
FAtt
67
1
0
10 Apr 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
59
0
0
08 Mar 2025
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
39
35
0
30 Sep 2024
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Frederik Pahde
Maximilian Dreyer
Leander Weber
Moritz Weckbecker
Christopher J. Anders
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
32
7
0
07 Feb 2022
1