Dynamical versus Bayesian Phase Transitions in a Toy Model of
Superposition

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

10 October 2023

Jake Mendel

Papers citing "Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition"

12 / 12 papers shown

Title
Modes of Sequence Models and Learning Coefficients Zhongtian Chen Daniel Murfet 77 1 0 25 Apr 2025
Emergence of Computational Structure in a Neural Network Physics Simulator Rohan Hitchcock Gary W. Delaney J. Manton Richard Scalzo Jingge Zhu 22 0 0 16 Apr 2025
Almost Bayesian: The Fractal Dynamics of Stochastic Gradient Descent Max Hennick Stijn De Baerdemacker 36 0 0 28 Mar 2025
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness Qi Zhang Yifei Wang Jingyi Cui Xiang Pan Qi Lei Stefanie Jegelka Yisen Wang AAML 29 1 0 27 Oct 2024
The Persian Rug: solving toy models of superposition using large-scale symmetries Aditya Cowsik Kfir Dolev Alex Infanger 19 0 0 15 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient George Wang Jesse Hoogland Stan van Wingerden Zach Furman Daniel Murfet OffRL 15 7 0 03 Oct 2024
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability Lucius Bushnaq Jake Mendel Stefan Heimersheim Dan Braun Nicholas Goldowsky-Dill Kaarel Hänni Cindy Wu Marius Hobbhahn 19 7 0 17 May 2024
Mechanistic Interpretability for AI Safety -- A Review Leonard Bereska E. Gavves AI4CE 38 111 0 22 Apr 2024
Estimating the Local Learning Coefficient at Scale Zach Furman Edmund Lau 17 3 0 06 Feb 2024
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics Emmanuel Abbe Enric Boix-Adserà Theodor Misiakiewicz FedML MLT 76 72 0 21 Feb 2023
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 453 0 24 Sep 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 120 314 0 21 Sep 2022