11
9

Deep Network Approximation in Terms of Intrinsic Parameters

Abstract

One of the arguments to explain the success of deep learning is the powerful approximation capacity of deep neural networks. Such capacity is generally accompanied by the explosive growth of the number of parameters, which, in turn, leads to high computational costs. It is of great interest to ask whether we can achieve successful deep learning with a small number of learnable parameters adapting to the target function. From an approximation perspective, this paper shows that the number of parameters that need to be learned can be significantly smaller than people typically expect. First, we theoretically design ReLU networks with a few learnable parameters to achieve an attractive approximation. We prove by construction that, for any Lipschitz continuous function ff on [0,1]d[0,1]^d with a Lipschitz constant λ>0\lambda>0, a ReLU network with n+2n+2 intrinsic parameters (those depending on ff) can approximate ff with an exponentially small error 5λd2n5\lambda \sqrt{d}\,2^{-n}. Such a result is generalized to generic continuous functions. Furthermore, we show that the idea of learning a small number of parameters to achieve a good approximation can be numerically observed. We conduct several experiments to verify that training a small part of parameters can also achieve good results for classification problems if other parameters are pre-specified or pre-trained from a related problem.

View on arXiv
Comments on this paper