ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06929
10
16

Scaling ResNets in the Large-depth Regime

14 June 2022
P. Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
ArXivPDFHTML
Abstract

Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth LLL increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor αL\alpha_LαL​. We show in a probabilistic setting that with standard i.i.d.~initializations, the only non-trivial dynamics is for αL=1L\alpha_L = \frac{1}{\sqrt{L}}αL​=L​1​; other choices lead either to explosion or to identity mapping. This scaling factor corresponds in the continuous-time limit to a neural stochastic differential equation, contrarily to a widespread interpretation that deep ResNets are discretizations of neural ordinary differential equations. By contrast, in the latter regime, stability is obtained with specific correlated initializations and αL=1L\alpha_L = \frac{1}{L}αL​=L1​. Our analysis suggests a strong interplay between scaling and regularity of the weights as a function of the layer index. Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training.

View on arXiv
@article{marion2025_2206.06929,
  title={ Scaling ResNets in the Large-depth Regime },
  author={ Pierre Marion and Adeline Fermanian and Gérard Biau and Jean-Philippe Vert },
  journal={arXiv preprint arXiv:2206.06929},
  year={ 2025 }
}
Comments on this paper