ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.03677
47
40

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

7 October 2021
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
    AI4CE
ArXivPDFHTML
Abstract

Recent empirical advances show that training deep models with large learning rate often improves generalization performance. However, theoretical justifications on the benefits of large learning rate are highly limited, due to challenges in analysis. In this paper, we consider using Gradient Descent (GD) with a large learning rate on a homogeneous matrix factorization problem, i.e., min⁡X,Y∥A−XY⊤∥F2\min_{X, Y} \|A - XY^\top\|_{\sf F}^2minX,Y​∥A−XY⊤∥F2​. We prove a convergence theory for constant large learning rates well beyond 2/L2/L2/L, where LLL is the largest eigenvalue of Hessian at the initialization. Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of XXX and YYY at the limit of GD iterations will be close even if their initialization is significantly unbalanced. Numerical experiments are provided to support our theory.

View on arXiv
Comments on this paper