Deep Network Approximation: Beyond ReLU to Diverse Activation Functions

Journal of machine learning research (JMLR), 2023

13 July 2023

Abstract

This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$ , $\mathtt{LeakyReLU}$ , $\mathtt{ReLU}^2$ , $\mathtt{ELU}$ , $\mathtt{SELU}$ , $\mathtt{Softplus}$ , $\mathtt{GELU}$ , $\mathtt{SiLU}$ , $\mathtt{Swish}$ , $\mathtt{Mish}$ , $\mathtt{Sigmoid}$ , $\mathtt{Tanh}$ , $\mathtt{Arctan}$ , $\mathtt{Softsign}$ , $\mathtt{dSiLU}$ , and $\mathtt{SRS}$ . We demonstrate that for any activation function $\varrho\in \mathscr{A}$ , a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$ -activated network of width $3N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, albeit with slightly increased constants. Significantly, we establish that the (width, $\,$ depth) scaling factors that appeared in the previous result can be further reduced from $(3,2)$ to $(1,1)$ if $\varrho$ falls within a specific subset of $\mathscr{A}$ . This subset includes activation functions such as $\mathtt{ELU}$ , $\mathtt{SELU}$ , $\mathtt{Softplus}$ , $\mathtt{GELU}$ , $\mathtt{SiLU}$ , $\mathtt{Swish}$ , and $\mathtt{Mish}$ .

View on arXiv

Comments on this paper