40

Eigenvalue distribution of the Neural Tangent Kernel in the quadratic scaling

Main:40 Pages
8 Figures
Bibliography:2 Pages
Abstract

We compute the asymptotic eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension. Namely, if XRn×dX\in\mathbb{R}^{n\times d} is an i.i.d random matrix, WRd×pW\in\mathbb{R}^{d\times p} is an i.i.d N(0,1)\mathcal{N}(0,1) matrix and DRp×pD\in\mathbb{R}^{p\times p} is a diagonal matrix with i.i.d bounded entries, we consider the matrix\[\mathrm{NTK}=\frac{1}{d}XX^\top\odot\frac{1}{p}\sigma'\left(\frac{1}{\sqrt{d}}XW\right)D^2\sigma'\left(\frac{1}{\sqrt{d}}XW\right)^\top\]where σ\sigma' is a pseudo-Lipschitz function applied entrywise and under the scaling ndpγ1\frac{n}{dp}\to \gamma_1 and pdγ2\frac{p}{d}\to \gamma_2. We describe the asymptotic distribution as the free multiplicative convolution of the Marchenko--Pastur distribution with a deterministic distribution depending on σ\sigma and DD.

View on arXiv
Comments on this paper