Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result by establishing that the output of a neural network trained using gradient descent can be approximated by a linear model when the network width is large. Indeed, under regularity conditions, the NTK converges to a time-independent kernel in the infinite-width limit. This regime is often called the NTK regime. In parallel, recent works on signal propagation (Poole et al., 2016; Schoenholz et al., 2017; Hayou et al., 2019a) studied the impact of the initialization and the activation function on signal propagation in deep neural networks. In this paper, we connect these two theories by quantifying the impact of the initialization and the activation function on the NTK when the network depth becomes large. In particular, we provide a comprehensive analysis of the convergence rates of the NTK regime to the infinite depth regime.
View on arXiv