Latent State Models of Training Dynamics

18 August 2023

Papers citing "Latent State Models of Training Dynamics"

8 / 8 papers shown

Title
Distributional Scaling Laws for Emergent Capabilities Rosie Zhao Tian Qin David Alvarez-Melis Sham Kakade Naomi Saphra LRM 32 0 0 24 Feb 2025
On The Specialization of Neural Modules Devon Jarvis Richard Klein Benjamin Rosman Andrew M. Saxe 61 12 0 23 Sep 2024
Survival of the Fittest Representation: A Case Study with Modular Addition Xiaoman Delores Ding Zifan Carl Guo Eric J. Michaud Ziming Liu Max Tegmark 29 3 0 27 May 2024
Omnigrok: Grokking Beyond Algorithmic Data Ziming Liu Eric J. Michaud Max Tegmark 54 76 0 03 Oct 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 240 453 0 24 Sep 2022
Linear Connectivity Reveals Generalization Strategies Jeevesh Juneja Rachit Bansal Kyunghyun Cho João Sedoc Naomi Saphra 232 45 0 24 May 2022
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 226 4,424 0 23 Jan 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 273 2,878 0 15 Sep 2016