ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.15933
16
52

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

30 June 2021
Arthur Jacot
François Ged
Berfin cSimcsek
Clément Hongler
Franck Gabriel
ArXivPDFHTML
Abstract

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance σ2\sigma^2σ2 of the parameters at initialization θ0\theta_0θ0​. For DLNs of width www, we show a phase transition w.r.t. the scaling γ\gammaγ of the variance σ2=w−γ\sigma^2=w^{-\gamma}σ2=w−γ as w→∞w\to\inftyw→∞: for large variance (γ<1\gamma<1γ<1), θ0\theta_0θ0​ is very close to a global minimum but far from any saddle point, and for small variance (γ>1\gamma>1γ>1), θ0\theta_0θ0​ is close to a saddle point and far from any global minimum. While the first case corresponds to the well-studied NTK regime, the second case is less understood. This motivates the study of the case γ→+∞\gamma \to +\inftyγ→+∞, where we conjecture a Saddle-to-Saddle dynamics: throughout training, gradient descent visits the neighborhoods of a sequence of saddles, each corresponding to linear maps of increasing rank, until reaching a sparse global minimum. We support this conjecture with a theorem for the dynamics between the first two saddles, as well as some numerical experiments.

View on arXiv
Comments on this paper