Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

4 October 2022

Abstract

Estimation of a regression function from independent and identically distributed random variables is considered. The $L_2$ error with integration with respect to the design measure is used as an error criterion. Over-parametrized deep neural network estimates are defined where all the weights are learned by the gradient descent. It is shown that the expected $L_2$ error of these estimates converges to zero with the rate close to $n^{-1/(1+d)}$ in case that the regression function is H\"older smooth with H\"older exponent $p \in [1/2,1]$ . In case of an interaction model where the regression function is assumed to be a sum of H\"older smooth functions where each of the functions depends only on $d^*$ many of $d$ components of the design variable, it is shown that these estimates achieve the corresponding $d^*$ -dimensional rate of convergence.

View on arXiv

Comments on this paper