ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.12079
18
2

Differential Equation Scaling Limits of Shaped and Unshaped Neural Networks

18 October 2023
Mufan Bill Li
Mihai Nica
ArXivPDFHTML
Abstract

Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled as the network size grows) have led to scaling limits described by differential equations. However, these results do not a priori tell us anything about "ordinary" unshaped networks, where the activation is unchanged as the network size grows. In this article, we find similar differential equation based asymptotic characterization for two types of unshaped networks. Firstly, we show that the following two architectures converge to the same infinite-depth-and-width limit at initialization: (i) a fully connected ResNet with a d−1/2d^{-1/2}d−1/2 factor on the residual branch, where ddd is the network depth. (ii) a multilayer perceptron (MLP) with depth d≪d \lld≪ width nnn and shaped ReLU activation at rate d−1/2d^{-1/2}d−1/2. Secondly, for an unshaped MLP at initialization, we derive the first order asymptotic correction to the layerwise correlation. In particular, if ρℓ\rho_\ellρℓ​ is the correlation at layer ℓ\ellℓ, then qt=ℓ2(1−ρℓ)q_t = \ell^2 (1 - \rho_\ell)qt​=ℓ2(1−ρℓ​) with t=ℓnt = \frac{\ell}{n}t=nℓ​ converges to an SDE with a singularity at t=0t=0t=0. These results together provide a connection between shaped and unshaped network architectures, and opens up the possibility of studying the effect of normalization methods and how it connects with shaping activation functions.

View on arXiv
Comments on this paper