96
13

Neural Network Architecture Beyond Width and Depth

Abstract

This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth. Neural network architectures with height, width, and depth as hyperparameters are called three-dimensional architectures. It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional architectures (those with only width and depth as hyperparameters), e.g., standard fully connected networks. The new network architecture is constructed recursively via a nested structure, and hence we call a network with the new architecture nested network (NestNet). A NestNet of height ss is built with each hidden neuron activated by a NestNet of height s1\le s-1. When s=1s=1, a NestNet degenerates to a standard network with a two-dimensional architecture. It is proved by construction that height-ss ReLU NestNets with O(n)\mathcal{O}(n) parameters can approximate Lipschitz continuous functions on [0,1]d[0,1]^d with an error O(n(s+1)/d)\mathcal{O}(n^{-(s+1)/d}), while the optimal approximation error of standard ReLU networks with O(n)\mathcal{O}(n) parameters is O(n2/d)\mathcal{O}(n^{-2/d}). Furthermore, such a result is extended to generic continuous functions on [0,1]d[0,1]^d with the approximation error characterized by the modulus of continuity. Finally, a numerical example is provided to explore the advantages of the super approximation power of ReLU NestNets.

View on arXiv
Comments on this paper