23
8

How Many Neurons Does it Take to Approximate the Maximum?

Abstract

We study the size of a neural network needed to approximate the maximum function over dd inputs, in the most basic setting of approximating with respect to the L2L_2 norm, for continuous distributions, for a network that uses ReLU activations. We provide new lower and upper bounds on the width required for approximation across various depths. Our results establish new depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as providing a depth O(log(log(d)))\mathcal{O}(\log(\log(d))) and width O(d)\mathcal{O}(d) construction which approximates the maximum function. Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights. Furthermore, we are able to use this depth 2 lower bound to provide tight bounds on the number of neurons needed to approximate the maximum by a depth 3 network. Our lower bounds are of potentially broad interest as they apply to the widely studied and used \emph{max} function, in contrast to many previous results that base their bounds on specially constructed or pathological functions and distributions.

View on arXiv
Comments on this paper