378

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions

Journal of machine learning research (JMLR), 2023
Abstract

This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set A\mathscr{A} is defined to encompass the majority of commonly used activation functions, such as ReLU\mathtt{ReLU}, LeakyReLU\mathtt{LeakyReLU}, ReLU2\mathtt{ReLU}^2, ELU\mathtt{ELU}, SELU\mathtt{SELU}, Softplus\mathtt{Softplus}, GELU\mathtt{GELU}, SiLU\mathtt{SiLU}, Swish\mathtt{Swish}, Mish\mathtt{Mish}, Sigmoid\mathtt{Sigmoid}, Tanh\mathtt{Tanh}, Arctan\mathtt{Arctan}, Softsign\mathtt{Softsign}, dSiLU\mathtt{dSiLU}, and SRS\mathtt{SRS}. We demonstrate that for any activation function ϱA\varrho\in \mathscr{A}, a ReLU\mathtt{ReLU} network of width NN and depth LL can be approximated to arbitrary precision by a ϱ\varrho-activated network of width 4N4N and depth 2L2L on any bounded set. This finding enables the extension of most approximation results achieved with ReLU\mathtt{ReLU} networks to a wide variety of other activation functions, at the cost of slightly larger constants.

View on arXiv
Comments on this paper