ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.13292
27
0

Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model

17 April 2025
Zhiwei Xu
Zhiyu Ni
Yixin Wang
Wei Hu
    CLL
ArXivPDFHTML
Abstract

''Grokking'' is a phenomenon where a neural network first memorizes training data and generalizes poorly, but then suddenly transitions to near-perfect generalization after prolonged training. While intriguing, this delayed generalization phenomenon compromises predictability and efficiency. Ideally, models should generalize directly without delay. To this end, this paper proposes GrokTransfer, a simple and principled method for accelerating grokking in training neural networks, based on the key observation that data embedding plays a crucial role in determining whether generalization is delayed. GrokTransfer first trains a smaller, weaker model to reach a nontrivial (but far from optimal) test performance. Then, the learned input embedding from this weaker model is extracted and used to initialize the embedding in the target, stronger model. We rigorously prove that, on a synthetic XOR task where delayed generalization always occurs in normal training, GrokTransfer enables the target model to generalize directly without delay. Moreover, we demonstrate that, across empirical studies of different tasks, GrokTransfer effectively reshapes the training dynamics and eliminates delayed generalization, for both fully-connected neural networks and Transformers.

View on arXiv
@article{xu2025_2504.13292,
  title={ Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model },
  author={ Zhiwei Xu and Zhiyu Ni and Yixin Wang and Wei Hu },
  journal={arXiv preprint arXiv:2504.13292},
  year={ 2025 }
}
Comments on this paper