11
21

Generalization Ability of Wide Neural Networks on R\mathbb{R}

Abstract

We perform a study on the generalization ability of the wide two-layer ReLU neural network on R\mathbb{R}. We first establish some spectral properties of the neural tangent kernel (NTK): a)a) KdK_{d}, the NTK defined on Rd\mathbb{R}^{d}, is positive definite; b)b) λi(K1)\lambda_{i}(K_{1}), the ii-th largest eigenvalue of K1K_{1}, is proportional to i2i^{-2}. We then show that: i)i) when the width mm\rightarrow\infty, the neural network kernel (NNK) uniformly converges to the NTK; ii)ii) the minimax rate of regression over the RKHS associated to K1K_{1} is n2/3n^{-2/3}; iii)iii) if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; iv)iv) if one trains the neural network till it overfits the data, the resulting neural network can not generalize well. Finally, we provide an explanation to reconcile our theory and the widely observed ``benign overfitting phenomenon''.

View on arXiv
Comments on this paper