26
13

Mixture of Experts Soften the Curse of Dimensionality in Operator Learning

Abstract

In this paper, we construct a mixture of neural operators (MoNOs) between function spaces whose complexity is distributed over a network of expert neural operators (NOs), with each NO satisfying parameter scaling restrictions. Our main result is a \textit{distributed} universal approximation theorem guaranteeing that any Lipschitz non-linear operator between L2([0,1]d)L^2([0,1]^d) spaces can be approximated uniformly over the Sobolev unit ball therein, to any given ε>0\varepsilon>0 accuracy, by an MoNO while satisfying the constraint that: each expert NO has a depth, width, and rank of O(ε1)\mathcal{O}(\varepsilon^{-1}). Naturally, our result implies that the required number of experts must be large, however, each NO is guaranteed to be small enough to be loadable into the active memory of most computers for reasonable accuracies ε\varepsilon. During our analysis, we also obtain new quantitative expression rates for classical NOs approximating uniformly continuous non-linear operators uniformly on compact subsets of L2([0,1]d)L^2([0,1]^d).

View on arXiv
Comments on this paper