The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

10 June 2022

Papers citing "The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon"

7 / 7 papers shown

Title
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Roman Abramov Felix Steinbauer Gjergji Kasneci 72 0 0 29 Apr 2025
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation Xinyu Zhou Simin Fan Martin Jaggi Jie Fu 23 0 0 24 Apr 2025
Survival of the Fittest Representation: A Case Study with Modular Addition Xiaoman Delores Ding Zifan Carl Guo Eric J. Michaud Ziming Liu Max Tegmark 34 3 0 27 May 2024
Grokking as Compression: A Nonlinear Complexity Perspective Ziming Liu Ziqian Zhong Max Tegmark 30 9 0 09 Oct 2023
Small-scale proxies for large-scale Transformer training instabilities Mitchell Wortsman Peter J. Liu Lechao Xiao Katie Everett A. Alemi ... Jascha Narain Sohl-Dickstein Kelvin Xu Jaehoon Lee Justin Gilmer Simon Kornblith 35 80 0 25 Sep 2023
Understanding Gradient Descent on Edge of Stability in Deep Learning Sanjeev Arora Zhiyuan Li A. Panigrahi MLT 75 88 0 19 May 2022
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 153 232 0 04 Mar 2020