563

Deep Semi-Random Features for Nonlinear Function Approximation

Abstract

We propose semi-random features for nonlinear function approximation. The flexibility of semi-random feature lies between the fully adjustable units in deep learning and the random features used in kernel methods. For one hidden layer models with semi-random features, we prove with no unrealistic assumptions that the model classes contain an arbitrarily good function as the width increases (universality), and despite non-convexity, we can find such a good function (optimization theory) that generalizes to unseen new data (generalization bound). For deep models, with no unrealistic assumptions, we prove universal approximation ability, a lower bound on approximation error, a partial optimization guarantee despite non-convexity, and a generalization bound. Our generalization bound can be independent of depth and the number of trainable weights; with deep models, we can easily generalize to unseen new data even if we have more weights than training data points. In experiments, we show that semi-random features can match the performance of neural networks by using slightly more units, and it outperforms random features by using significantly fewer units. Semi-random features provide an interesting data point in between kernel methods and neural networks to advance our understanding of the challenge of nonlinear function approximation, and it opens up new avenues to tackle the challenge further.

View on arXiv
Comments on this paper