71
v1v2v3v4 (latest)

Optimal Neural Network Approximation for High-Dimensional Continuous Functions

Main:7 Pages
4 Figures
Bibliography:2 Pages
Abstract

Recently, the authors of \cite{SYZ22} developed a neural network with width 36d(2d+1)36d(2d + 1) and depth 1111, which utilizes a special activation function called the elementary universal activation function, to achieve the super approximation property for functions in C([a,b]d)C([a,b]^d). That is, the constructed network only requires a fixed number of neurons (and thus parameters) to approximate a dd-variate continuous function on a dd-dimensional hypercube with arbitrary accuracy. More specifically, only O(d2)\mathcal{O}(d^2) neurons or parameters are used. One natural question is whether we can reduce the number of these neurons or parameters in such a network. By leveraging a variant of the Kolmogorov Superposition Theorem, \textcolor{black}{we show that there is a composition of networks generated by the elementary universal activation function with at most 10889d+1088710889d + 10887 nonzero parameters such that this super approximation property is attained. The composed network consists of repeated evaluations of two neural networks: one with width 36(2d+1)36(2d+1) and the other with width 36, both having 5 layers.} Furthermore, we present a family of continuous functions that requires at least width dd, and thus at least dd neurons or parameters, to achieve arbitrary accuracy in its approximation. This suggests that the number of nonzero parameters is optimal in the sense that it grows linearly with the input dimension dd, unlike some approximation methods where parameters may grow exponentially with dd.

View on arXiv
Comments on this paper