437

Neural Network Approximation: Three Hidden Layers Are Enough

Neural Networks (NN), 2020
Abstract

A three-hidden-layer neural network with super approximation power is introduced. This network is built with the Floor function (x\lfloor x\rfloor), the exponential function (2x2^x), the step function (\onex0\one_{x\geq 0}), or their compositions as activation functions in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter NN+N\in\mathbb{N}^+, it is shown that FLES networks with a width max{d,N}\max\{d,\, N\} and three hidden layers can uniformly approximate a H{\"o}lder function ff on [0,1]d[0,1]^d with an exponential approximation rate 3λdα/22αN3\lambda d^{\alpha/2}2^{-\alpha N}, where α(0,1]\alpha \in(0,1] and λ\lambda are the H{\"o}lder order and constant, respectively. More generally for an arbitrary continuous function ff on [0,1]d[0,1]^d with a modulus of continuity ωf()\omega_f(\cdot), the constructive approximation rate is ωf(d2N)+2ωf(d)2N\omega_f(\sqrt{d}\,2^{-N})+2\omega_f(\sqrt{d}){2^{-N}}. As a consequence, this new {class of networks} overcomes the curse of dimensionality in approximation power when the variation of ωf(r)\omega_f(r) as r0r\rightarrow 0 is moderate (e.g., ωf(r)rα\omega_f(r){\lesssim} r^\alpha for H{\"o}lder continuous functions), since the major term to be concerned in our approximation rate is essentially d\sqrt{d} times a function of NN independent of dd within the modulus of continuity.

View on arXiv
Comments on this paper