v1v2 (latest)

Grokking at the Edge of Numerical Stability

International Conference on Learning Representations (ICLR), 2025

8 January 2025

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)Github

Papers citing "Grokking at the Edge of Numerical Stability"

41 / 41 papers shown

When Data Falls Short: Grokking Below the Critical Threshold

Vaibhav Singh

Eugene Belilovsky

Rahaf Aljundi

138

06 Nov 2025

Adversarial Attacks Leverage Interference Between Features in Superposition

139

13 Oct 2025

Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking

Ali Saheb Pasand

Elvis Dohmatob

158

06 Oct 2025

Beyond the Linear Separability Ceiling: Aligning Representations in VLMs

338

10 Jul 2025

Hierarchical Reasoning Model

659

26 Jun 2025

Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Ziyue Li

Chenrui Fan

Tianyi Zhou

483

26 Jun 2025

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

...

997

214

29 Apr 2025

NeuralGrok: Accelerate Grokking by Neural Gradient Transformation

290

24 Apr 2025

Muon Optimizer Accelerates Grokking

Amund Tveit

Bjørn Remseth

Arve Skogvold

293

22 Apr 2025

Grokking at the Edge of Linear Separability

Alon Beck

Noam Levi

Yohai Bar-Sinai

464

06 Oct 2024

Language Models "Grok" to CopyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Xingwu Sun

Rui Yan

446

14 Sep 2024

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Neil Rohit Mallinar

Daniel Beaglehole

Libin Zhu

Adityanarayanan Radhakrishnan

Parthe Pandit

Misha Belkin

430

29 Jul 2024

Deep Networks Always Grok and Here is Why

Ahmed Imtiaz Humayun

Randall Balestriero

Richard Baraniuk

AAML OOD AI4CE

536

23 Feb 2024

Grokking Group Multiplication with Cosets

371

11 Dec 2023

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce GrokkingInternational Conference on Learning Representations (ICLR), 2023

425

30 Nov 2023

Achieving Margin Maximization Exponentially Fast via Progressive Norm RescalingInternational Conference on Machine Learning (ICML), 2023

Mingze Wang

Zeping Min

Lei Wu

568

24 Nov 2023

Grokking as the Transition from Lazy to Rich Training DynamicsInternational Conference on Learning Representations (ICLR), 2023

419

09 Oct 2023

Why Do We Need Weight Decay in Modern Deep Learning?Neural Information Processing Systems (NeurIPS), 2023

Maksym Andriushchenko

Francesco DÁngelo

Aditya Varre

Nicolas Flammarion

410

06 Oct 2023

Grokking as a First Order Phase Transition in Two Layer NetworksInternational Conference on Learning Representations (ICLR), 2023

Noa Rubin

Inbar Seroussi

Zohar Ringel

347

05 Oct 2023

Explaining grokking through circuit efficiency

311

05 Sep 2023

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural NetworksInternational Conference on Machine Learning (ICML), 2023

Atli Kosson

Bettina Messmer

Martin Jaggi

551

26 May 2023

A Toy Model of Universality: Reverse Engineering How Networks Learn Group OperationsInternational Conference on Machine Learning (ICML), 2023

Bilal Chughtai

Lawrence Chan

Neel Nanda

420

141

06 Feb 2023

Progress measures for grokking via mechanistic interpretabilityInternational Conference on Learning Representations (ICLR), 2023

643

728

12 Jan 2023

Grokking modular arithmetic

Andrey Gromov

291

06 Jan 2023

Grokking phase transitions in learning local rules with gradient descentJournal of machine learning research (JMLR), 2022

Bojan Žunkovič

E. Ilievski

348

26 Oct 2022

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural NetworksInternational Conference on Learning Representations (ICLR), 2022

D. Kunin

Atsushi Yamamura

Chao Ma

Surya Ganguli

222

07 Oct 2022

Omnigrok: Grokking Beyond Algorithmic DataInternational Conference on Learning Representations (ICLR), 2022

Ziming Liu

Eric J. Michaud

Max Tegmark

441

126

03 Oct 2022

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational LimitNeural Information Processing Systems (NeurIPS), 2022

471

177

18 Jul 2022

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

311

10 Jun 2022

DeepStability: A Study of Unstable Numerical Methods and Their Solutions in Deep LearningInternational Conference on Software Engineering (ICSE), 2022

Eliska Kloberdanz

Kyle G. Kloberdanz

Wei Le

290

07 Feb 2022

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

492

571

06 Jan 2022

Intrinsic Dimension, Persistent Homology and Generalization in Neural NetworksNeural Information Processing Systems (NeurIPS), 2021

303

25 Nov 2021

Directional convergence and alignment in deep learningNeural Information Processing Systems (NeurIPS), 2020

Ziwei Ji

Matus Telgarsky

396

210

11 Jun 2020

Gradient Descent Maximizes the Margin of Homogeneous Neural NetworksInternational Conference on Learning Representations (ICLR), 2019

Kaifeng Lyu

Jian Li

579

395

13 Jun 2019

On Lazy Training in Differentiable Programming

Lénaïc Chizat

Edouard Oyallon

Francis R. Bach

725

951

19 Dec 2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

3.1K

112,756

11 Oct 2018

Gradient descent aligns the layers of deep linear networks

Ziwei Ji

Matus Telgarsky

401

290

04 Oct 2018

Risk and parameter convergence of logistic regression

Ziwei Ji

Matus Telgarsky

398

141

20 Mar 2018

PPFNet: Global Context Aware Local Features for Robust 3D Point Matching

322

627

07 Feb 2018

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

1.4K

17,228

02 Dec 2016

ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision (IJCV), 2014

...

Li Fei-Fei

3.7K

42,317

01 Sep 2014