152

MLPs at the EOC: Spectrum of the NTK

Main:15 Pages
1 Figures
Bibliography:2 Pages
Appendix:1 Pages
Abstract

We study the properties of the Neural Tangent Kernel (NTK) K:Rm0×Rm0Rml×ml\overset{\scriptscriptstyle\infty}{K} : \mathbb{R}^{m_0} \times \mathbb{R}^{m_0} \to \mathbb{R}^{m_l \times m_l} corresponding to infinitely wide ll-layer Multilayer Perceptrons (MLPs) taking inputs from Rm0\mathbb{R}^{m_0} to outputs in Rml\mathbb{R}^{m_l} equipped with activation functions ϕ(s)=as+bs\phi(s) = a s + b \vert s \vert for some a,bRa,b \in \mathbb{R} and initialized at the Edge Of Chaos (EOC). We find that the entries K(x1,x2)\overset{\scriptscriptstyle\infty}{K}(x_1,x_2) can be approximated by the inverses of the cosine distances of the activations corresponding to x1x_1 and x2x_2 increasingly better as the depth ll increases. By quantifying these inverse cosine distances and the spectrum of the matrix containing them, we obtain tight spectral bounds for the NTK matrix K=[1nK(xi1,xi2):i1,i2[1:n]]\overset{\scriptscriptstyle\infty}{K} = [\frac{1}{n} \overset{\scriptscriptstyle\infty}{K}(x_{i_1},x_{i_2}) : i_1, i_2 \in [1:n]] over a dataset {x1,,xn}Rm0\{x_1,\cdots,x_n\} \subset \mathbb{R}^{m_0}, transferred from the inverse cosine distance matrix via our approximation result. Our results show that Δϕ=b2a2+b2\Delta_\phi = \frac{b^2}{a^2+b^2} determines the rate at which the condition number of the NTK matrix converges to its limit as depth increases, implying in particular that the absolute value (Δϕ=1\Delta_\phi=1) is better than the ReLU (Δϕ=12\Delta_\phi=\frac{1}{2}) in this regard.

View on arXiv
Comments on this paper