Unifying Grokking and Double Descent

10 March 2023

Papers citing "Unifying Grokking and Double Descent"

26 / 26 papers shown

Title
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Roman Abramov Felix Steinbauer Gjergji Kasneci 51 0 0 29 Apr 2025
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation Xinyu Zhou Simin Fan Martin Jaggi Jie Fu 18 0 0 24 Apr 2025
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model Zhiwei Xu Zhiyu Ni Yixin Wang Wei Hu CLL 32 0 0 17 Apr 2025
How more data can hurt: Instability and regularization in next-generation reservoir computing Yuanzhao Zhang Edmilson Roque dos Santos Sean P. Cornelius 77 2 0 28 Jan 2025
Grokking at the Edge of Linear Separability Alon Beck Noam Levi Yohai Bar-Sinai 24 0 0 06 Oct 2024
Approaching Deep Learning through the Spectral Dynamics of Weights David Yunis Kumar Kshitij Patel Samuel Wheeler Pedro H. P. Savarese Gal Vardi Karen Livescu Michael Maire Matthew R. Walter 34 3 0 21 Aug 2024
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product Neil Rohit Mallinar Daniel Beaglehole Libin Zhu Adityanarayanan Radhakrishnan Parthe Pandit Misha Belkin 37 7 0 29 Jul 2024
One system for learning and remembering episodes and rules Joshua T. S. Hewson Sabina J. Sloman Marina Dubova CLL 20 0 0 08 Jul 2024
Grokking Modular Polynomials Darshil Doshi Tianyu He Aritra Das Andrey Gromov 29 4 0 05 Jun 2024
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Jaerin Lee Bong Gyun Kang Kihoon Kim Kyoung Mu Lee 25 11 0 30 May 2024
Survival of the Fittest Representation: A Case Study with Modular Addition Xiaoman Delores Ding Zifan Carl Guo Eric J. Michaud Ziming Liu Max Tegmark 29 3 0 27 May 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Boshi Wang Xiang Yue Yu-Chuan Su Huan Sun LRM 16 41 0 23 May 2024
Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition Yufei Huang Shengding Hu Xu Han Zhiyuan Liu Maosong Sun 62 14 0 23 Feb 2024
On Catastrophic Inheritance of Large Foundation Models Hao Chen Bhiksha Raj Xing Xie Jindong Wang AI4CE 48 12 0 02 Feb 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking Kaifeng Lyu Jikai Jin Zhiyuan Li Simon S. Du Jason D. Lee Wei Hu AI4CE 22 32 0 30 Nov 2023
Understanding Grokking Through A Robustness Viewpoint Zhiquan Tan Weiran Huang AAML OOD 25 6 0 11 Nov 2023
Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks Gouki Minegishi Yusuke Iwasawa Yutaka Matsuo 11 3 0 30 Oct 2023
Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding Noam Levi Alon Beck Yohai Bar-Sinai 11 16 0 25 Oct 2023
Grokking as the Transition from Lazy to Rich Training Dynamics Tanishq Kumar Blake Bordelon Samuel Gershman C. Pehlevan 20 31 0 09 Oct 2023
Grokking as Compression: A Nonlinear Complexity Perspective Ziming Liu Ziqian Zhong Max Tegmark 12 9 0 09 Oct 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data Zhiwei Xu Yutong Wang Spencer Frei Gal Vardi Wei Hu MLT 11 23 0 04 Oct 2023
Explaining grokking through circuit efficiency Vikrant Varma Rohin Shah Zachary Kenton János Kramár Ramana Kumar 8 47 0 05 Sep 2023
The semantic landscape paradigm for neural networks Shreyas Gokhale 13 2 0 18 Jul 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations Bilal Chughtai Lawrence Chan Neel Nanda 10 96 0 06 Feb 2023
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics Shoaib Ahmed Siddiqui Nitarshan Rajkumar Tegan Maharaj David M. Krueger Sara Hooker 30 27 0 20 Sep 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent Mohammad Pezeshki Amartya Mitra Yoshua Bengio Guillaume Lajoie 45 25 0 06 Dec 2021