A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory
We consider the problem of exact computation of the maximum function over real inputs using ReLU neural networks. We prove a depth hierarchy, wherein width is necessary to represent the maximum for any depth . This is the first unconditional super-linear lower bound for this fundamental operator at depths , and it holds even if the depth scales with . Our proof technique is based on a combinatorial argument and associates the non-differentiable ridges of the maximum with cliques in a graph induced by the first hidden layer of the computing network, utilizing Turán's theorem from extremal graph theory to show that a sufficiently narrow network cannot capture the non-linearities of the maximum. This suggests that despite its simple nature, the maximum function possesses an inherent complexity that stems from the geometric structure of its non-differentiable hyperplanes, and provides a novel approach for proving lower bounds for deep neural networks.
View on arXiv