Eigenvalue distribution of the Neural Tangent Kernel in the quadratic scaling

Main:40 Pages
8 Figures
Bibliography:2 Pages
Abstract
We compute the asymptotic eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension. Namely, if is an i.i.d random matrix, is an i.i.d matrix and is a diagonal matrix with i.i.d bounded entries, we consider the matrix\[\mathrm{NTK}=\frac{1}{d}XX^\top\odot\frac{1}{p}\sigma'\left(\frac{1}{\sqrt{d}}XW\right)D^2\sigma'\left(\frac{1}{\sqrt{d}}XW\right)^\top\]where is a pseudo-Lipschitz function applied entrywise and under the scaling and . We describe the asymptotic distribution as the free multiplicative convolution of the Marchenko--Pastur distribution with a deterministic distribution depending on and .
View on arXivComments on this paper