189

Comparing the Parameter Complexity of Hypernetworks and the Embedding-Based Alternative

Abstract

In the context of learning to map an input II to a function hI:XRh_I:\mathcal{X}\to \mathbb{R}, we compare two alternative methods: (i) an embedding-based method, which learns a fixed function in which II is encoded as a conditioning signal e(I)e(I) and the learned function takes the form hI(x)=q(x,e(I))h_I(x) = q(x,e(I)), and (ii) hypernetworks, in which the weights θI\theta_I of the function hI(x)=g(x;θI)h_I(x) = g(x;\theta_I) are given by a hypernetwork ff as θI=f(I)\theta_I=f(I). We extend the theory of~\cite{devore} and provide a lower bound on the complexity of neural networks as function approximators, i.e., the number of trainable parameters. This extension, eliminates the requirements for the approximation method to be robust. Our results are then used to compare the complexities of qq and gg, showing that under certain conditions and when letting the functions ee and ff be as large as we wish, gg can be smaller than qq by orders of magnitude. In addition, we show that for typical assumptions on the function to be approximated, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.

View on arXiv
Comments on this paper