Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width and depth can approximate a H\"older continuous function on with an approximation rate , where and are H\"older order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function on , the approximation rate becomes , where is the modulus of continuity. We also extend our analysis to any continuous function on a bounded set. Particularly, if ReLU networks with depth and width are used to approximate one-dimensional Lipschitz continuous functions on with a Lipschitz constant , the approximation rate in terms of the total number of parameters, , becomes , which has not been discovered in the literature for fixed-depth ReLU networks.
View on arXiv