ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.12284
19
0

GrokAlign: Geometric Characterisation and Acceleration of Grokking

14 June 2025
Thomas Walker
Ahmed Imtiaz Humayun
Randall Balestriero
Richard G. Baraniuk
ArXiv (abs)PDFHTML
Main:11 Pages
11 Figures
Bibliography:3 Pages
3 Tables
Appendix:9 Pages
Abstract

A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network's functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network's Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks -- a method we introduce as GrokAlign -- which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying \href{this https URL}{webpage} and \href{this https URL}{code}.

View on arXiv
@article{walker2025_2506.12284,
  title={ GrokAlign: Geometric Characterisation and Acceleration of Grokking },
  author={ Thomas Walker and Ahmed Imtiaz Humayun and Randall Balestriero and Richard Baraniuk },
  journal={arXiv preprint arXiv:2506.12284},
  year={ 2025 }
}
Comments on this paper