21
10

Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

Abstract

Estimation of a regression function from independent and identically distributed random variables is considered. The L2L_2 error with integration with respect to the design measure is used as an error criterion. Over-parametrized deep neural network estimates are defined where all the weights are learned by the gradient descent. It is shown that the expected L2L_2 error of these estimates converges to zero with the rate close to n1/(1+d)n^{-1/(1+d)} in case that the regression function is H\"older smooth with H\"older exponent p[1/2,1]p \in [1/2,1]. In case of an interaction model where the regression function is assumed to be a sum of H\"older smooth functions where each of the functions depends only on dd^* many of dd components of the design variable, it is shown that these estimates achieve the corresponding dd^*-dimensional rate of convergence.

View on arXiv
Comments on this paper