11
114

Neural Network Approximation: Three Hidden Layers Are Enough

Abstract

A three-hidden-layer neural network with super approximation power is introduced. This network is built with the floor function (x\lfloor x\rfloor), the exponential function (2x2^x), the step function (1x01_{x\geq 0}), or their compositions as the activation function in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter NN+N\in\mathbb{N}^+, it is shown that FLES networks with width max{d,N}\max\{d,N\} and three hidden layers can uniformly approximate a H\"older continuous function ff on [0,1]d[0,1]^d with an exponential approximation rate 3λ(2d)α2αN3\lambda (2\sqrt{d})^{\alpha} 2^{-\alpha N}, where α(0,1]\alpha \in(0,1] and λ>0\lambda>0 are the H\"older order and constant, respectively. More generally for an arbitrary continuous function ff on [0,1]d[0,1]^d with a modulus of continuity ωf()\omega_f(\cdot), the constructive approximation rate is 2ωf(2d)2N+ωf(2d2N)2\omega_f(2\sqrt{d}){2^{-N}}+\omega_f(2\sqrt{d}\,2^{-N}). Moreover, we extend such a result to general bounded continuous functions on a bounded set ERdE\subseteq\mathbb{R}^d. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of ωf(r)\omega_f(r) as r0r\rightarrow 0 is moderate (e.g., ωf(r)rα\omega_f(r)\lesssim r^\alpha for H\"older continuous functions), since the major term to be concerned in our approximation rate is essentially d\sqrt{d} times a function of NN independent of dd within the modulus of continuity. Finally, we extend our analysis to derive similar approximation results in the LpL^p-norm for p[1,)p\in[1,\infty) via replacing Floor-Exponential-Step activation functions by continuous activation functions.

View on arXiv
Comments on this paper