Optimal Neural Network Approximation for High-Dimensional Continuous Functions
Recently, the authors of \cite{SYZ22} developed a neural network with width and depth , which utilizes a special activation function called the elementary universal activation function, to achieve the super approximation property for functions in . That is, the constructed network only requires a fixed number of neurons (and thus parameters) to approximate a -variate continuous function on a -dimensional hypercube with arbitrary accuracy. More specifically, only neurons or parameters are used. One natural question is whether we can reduce the number of these neurons or parameters in such a network. By leveraging a variant of the Kolmogorov Superposition Theorem, \textcolor{black}{we show that there is a composition of networks generated by the elementary universal activation function with at most nonzero parameters such that this super approximation property is attained. The composed network consists of repeated evaluations of two neural networks: one with width and the other with width 36, both having 5 layers.} Furthermore, we present a family of continuous functions that requires at least width , and thus at least neurons or parameters, to achieve arbitrary accuracy in its approximation. This suggests that the number of nonzero parameters is optimal in the sense that it grows linearly with the input dimension , unlike some approximation methods where parameters may grow exponentially with .
View on arXiv