ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.03385
13
1

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

6 April 2023
Luís Carvalho
Joao L. Costa
José Mourao
Gonccalo Oliveira
    AI4CE
ArXivPDFHTML
Abstract

Recent developments in applications of artificial neural networks with over n=1014n=10^{14}n=1014 parameters make it extremely important to study the large nnn behaviour of such networks. Most works studying wide neural networks have focused on the infinite width n→+∞n \to +\inftyn→+∞ limit of such networks and have shown that, at initialization, they correspond to Gaussian processes. In this work we will study their behavior for large, but finite nnn. Our main contributions are the following: (1) The computation of the corrections to Gaussianity in terms of an asymptotic series in n−12n^{-\frac{1}{2}}n−21​. The coefficients in this expansion are determined by the statistics of parameter initialization and by the activation function. (2) Controlling the evolution of the outputs of finite width nnn networks, during training, by computing deviations from the limiting infinite width case (in which the network evolves through a linear flow). This improves previous estimates and yields sharper decay rates for the (finite width) NTK in terms of nnn, valid during the entire training procedure. As a corollary, we also prove that, with arbitrarily high probability, the training of sufficiently wide neural networks converges to a global minimum of the corresponding quadratic loss function. (3) Estimating how the deviations from Gaussianity evolve with training in terms of nnn. In particular, using a certain metric in the space of measures we find that, along training, the resulting measure is within n−12(log⁡n)1+n^{-\frac{1}{2}}(\log n)^{1+}n−21​(logn)1+ of the time dependent Gaussian process corresponding to the infinite width network (which is explicitly given by precomposing the initial Gaussian process with the linear flow corresponding to training in the infinite width limit).

View on arXiv
Comments on this paper