ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.10652
91
183

Toy Models of Superposition

21 September 2022
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
Shauna Kravec
Zac Hatfield-Dodds
R. Lasenby
Dawn Drain
Carol Chen
Roger C. Grosse
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
    AAML
    MILM
ArXivPDFHTML
Abstract

Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides a toy model where polysemanticity can be fully understood, arising as a result of models storing additional sparse features in "superposition." We demonstrate the existence of a phase change, a surprising connection to the geometry of uniform polytopes, and evidence of a link to adversarial examples. We also discuss potential implications for mechanistic interpretability.

View on arXiv
Comments on this paper