347
v1v2 (latest)

Parallel Layer Normalization for Universal Approximation

Main:8 Pages
5 Figures
Bibliography:2 Pages
2 Tables
Appendix:35 Pages
Abstract

This paper studies the approximation capabilities of neural networks that combine layer normalization (LN) with linear layers. We prove that networks consisting of two linear layers with parallel layer normalizations (PLNs) inserted between them (referred to as PLN-Nets) achieve universal approximation, whereas architectures that use only standard LN exhibit strictly limited expressivethis http URLfurther analyze approximation rates of shallow and deep PLN-Nets under the LL^\infty norm as well as in Sobolev norms. Our analysis extends beyond LN to RMSNorm, and from standard MLPs to position-wise feed-forward networks, the core building blocks used in RNNs andthis http URL, we provide empirical experiments to explore other possible potentials of PLN-Nets.

View on arXiv
Comments on this paper