ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.04697
  4. Cited By
Grokking at the Edge of Numerical Stability
v1v2 (latest)

Grokking at the Edge of Numerical Stability

International Conference on Learning Representations (ICLR), 2025
8 January 2025
Lucas Prieto
Melih Barsbey
Pedro A.M. Mediano
Tolga Birdal
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)Github

Papers citing "Grokking at the Edge of Numerical Stability"

41 / 41 papers shown
When Data Falls Short: Grokking Below the Critical Threshold
When Data Falls Short: Grokking Below the Critical Threshold
Vaibhav Singh
Eugene Belilovsky
Rahaf Aljundi
138
0
0
06 Nov 2025
Adversarial Attacks Leverage Interference Between Features in Superposition
Adversarial Attacks Leverage Interference Between Features in Superposition
Edward Stevinson
Lucas Prieto
Melih Barsbey
Tolga Birdal
AAML
139
3
0
13 Oct 2025
Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking
Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking
Ali Saheb Pasand
Elvis Dohmatob
158
1
0
06 Oct 2025
Beyond the Linear Separability Ceiling: Aligning Representations in VLMs
Beyond the Linear Separability Ceiling: Aligning Representations in VLMs
Enrico Vompa
Tanel Tammet
Mohit Vaishnav
VLMLRM
338
0
0
10 Jul 2025
Hierarchical Reasoning Model
Hierarchical Reasoning Model
Guan Wang
Jin Li
Yuhao Sun
Xing Chen
Changling Liu
Yue Wu
Meng Lu
Sen Song
Yasin Abbasi Yadkori
LRM
659
72
0
26 Jun 2025
Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Ziyue Li
Chenrui Fan
Tianyi Zhou
483
4
0
26 Jun 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Haoran Pan
OffRLReLMLRM
997
214
0
29 Apr 2025
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
Xinyu Zhou
Simin Fan
Martin Jaggi
Jie Fu
290
2
0
24 Apr 2025
Muon Optimizer Accelerates Grokking
Muon Optimizer Accelerates Grokking
Amund Tveit
Bjørn Remseth
Arve Skogvold
293
6
0
22 Apr 2025
Grokking at the Edge of Linear Separability
Grokking at the Edge of Linear Separability
Alon Beck
Noam Levi
Yohai Bar-Sinai
464
6
0
06 Oct 2024
Language Models "Grok" to Copy
Language Models "Grok" to CopyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Ang Lv
Ruobing Xie
Xingwu Sun
Zhanhui Kang
Rui Yan
LLMAG
446
4
0
14 Sep 2024
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Neil Rohit Mallinar
Daniel Beaglehole
Libin Zhu
Adityanarayanan Radhakrishnan
Parthe Pandit
Misha Belkin
430
20
0
29 Jul 2024
Deep Networks Always Grok and Here is Why
Deep Networks Always Grok and Here is Why
Ahmed Imtiaz Humayun
Randall Balestriero
Richard Baraniuk
AAMLOODAI4CE
536
51
0
23 Feb 2024
Grokking Group Multiplication with Cosets
Grokking Group Multiplication with Cosets
Dashiell Stander
Qinan Yu
Honglu Fan
Stella Biderman
371
20
0
11 Dec 2023
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce
  Grokking
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce GrokkingInternational Conference on Learning Representations (ICLR), 2023
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
425
70
0
30 Nov 2023
Achieving Margin Maximization Exponentially Fast via Progressive Norm
  Rescaling
Achieving Margin Maximization Exponentially Fast via Progressive Norm RescalingInternational Conference on Machine Learning (ICML), 2023
Mingze Wang
Zeping Min
Lei Wu
568
4
0
24 Nov 2023
Grokking as the Transition from Lazy to Rich Training Dynamics
Grokking as the Transition from Lazy to Rich Training DynamicsInternational Conference on Learning Representations (ICLR), 2023
Tanishq Kumar
Blake Bordelon
Samuel Gershman
Cengiz Pehlevan
419
83
0
09 Oct 2023
Why Do We Need Weight Decay in Modern Deep Learning?
Why Do We Need Weight Decay in Modern Deep Learning?Neural Information Processing Systems (NeurIPS), 2023
Maksym Andriushchenko
Francesco DÁngelo
Aditya Varre
Nicolas Flammarion
410
75
0
06 Oct 2023
Grokking as a First Order Phase Transition in Two Layer Networks
Grokking as a First Order Phase Transition in Two Layer NetworksInternational Conference on Learning Representations (ICLR), 2023
Noa Rubin
Inbar Seroussi
Zohar Ringel
347
41
0
05 Oct 2023
Explaining grokking through circuit efficiency
Explaining grokking through circuit efficiency
Vikrant Varma
Rohin Shah
Zachary Kenton
János Kramár
Ramana Kumar
311
87
0
05 Sep 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural
  Networks
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Atli Kosson
Bettina Messmer
Martin Jaggi
551
36
0
26 May 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn
  Group Operations
A Toy Model of Universality: Reverse Engineering How Networks Learn Group OperationsInternational Conference on Machine Learning (ICML), 2023
Bilal Chughtai
Lawrence Chan
Neel Nanda
420
141
0
06 Feb 2023
Progress measures for grokking via mechanistic interpretability
Progress measures for grokking via mechanistic interpretabilityInternational Conference on Learning Representations (ICLR), 2023
Neel Nanda
Lawrence Chan
Tom Lieberum
Jess Smith
Jacob Steinhardt
643
728
0
12 Jan 2023
Grokking modular arithmetic
Grokking modular arithmetic
Andrey Gromov
291
69
0
06 Jan 2023
Grokking phase transitions in learning local rules with gradient descent
Grokking phase transitions in learning local rules with gradient descentJournal of machine learning research (JMLR), 2022
Bojan Žunkovič
E. Ilievski
348
26
0
26 Oct 2022
The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks
The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural NetworksInternational Conference on Learning Representations (ICLR), 2022
D. Kunin
Atsushi Yamamura
Chao Ma
Surya Ganguli
222
26
0
07 Oct 2022
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic DataInternational Conference on Learning Representations (ICLR), 2022
Ziming Liu
Eric J. Michaud
Max Tegmark
441
126
0
03 Oct 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the
  Computational Limit
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational LimitNeural Information Processing Systems (NeurIPS), 2022
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
471
177
0
18 Jul 2022
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and
  the Grokking Phenomenon
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Vimal Thilak
Etai Littwin
Shuangfei Zhai
Omid Saremi
Roni Paiss
J. Susskind
311
83
0
10 Jun 2022
DeepStability: A Study of Unstable Numerical Methods and Their Solutions
  in Deep Learning
DeepStability: A Study of Unstable Numerical Methods and Their Solutions in Deep LearningInternational Conference on Software Engineering (ICSE), 2022
Eliska Kloberdanz
Kyle G. Kloberdanz
Wei Le
290
24
0
07 Feb 2022
Grokking: Generalization Beyond Overfitting on Small Algorithmic
  Datasets
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Alethea Power
Yuri Burda
Harrison Edwards
Igor Babuschkin
Vedant Misra
492
571
0
06 Jan 2022
Intrinsic Dimension, Persistent Homology and Generalization in Neural
  Networks
Intrinsic Dimension, Persistent Homology and Generalization in Neural NetworksNeural Information Processing Systems (NeurIPS), 2021
Tolga Birdal
Aaron Lou
Leonidas Guibas
Umut cSimcsekli
303
86
0
25 Nov 2021
Directional convergence and alignment in deep learning
Directional convergence and alignment in deep learningNeural Information Processing Systems (NeurIPS), 2020
Ziwei Ji
Matus Telgarsky
396
210
0
11 Jun 2020
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Gradient Descent Maximizes the Margin of Homogeneous Neural NetworksInternational Conference on Learning Representations (ICLR), 2019
Kaifeng Lyu
Jian Li
579
395
0
13 Jun 2019
On Lazy Training in Differentiable Programming
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
725
951
0
19 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
3.1K
112,756
0
11 Oct 2018
Gradient descent aligns the layers of deep linear networks
Gradient descent aligns the layers of deep linear networks
Ziwei Ji
Matus Telgarsky
401
290
0
04 Oct 2018
Risk and parameter convergence of logistic regression
Risk and parameter convergence of logistic regression
Ziwei Ji
Matus Telgarsky
398
141
0
20 Mar 2018
PPFNet: Global Context Aware Local Features for Robust 3D Point Matching
PPFNet: Global Context Aware Local Features for Robust 3D Point Matching
Haowen Deng
Tolga Birdal
Slobodan Ilic
3DV3DPC
322
627
0
07 Feb 2018
PointNet: Deep Learning on Point Sets for 3D Classification and
  Segmentation
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
C. Qi
Hao Su
Kaichun Mo
Leonidas Guibas
3DH3DPC3DVPINN
1.4K
17,228
0
02 Dec 2016
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision (IJCV), 2014
Olga Russakovsky
Gaowen Liu
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLMObjD
3.7K
42,317
0
01 Sep 2014
1
Page 1 of 1