40
26

Universal Approximation Under Constraints is Possible with Transformers

Abstract

Many practical problems need the output of a machine learning model to satisfy a set of constraints, KK. Nevertheless, there is no known guarantee that classical neural network architectures can exactly encode constraints while simultaneously achieving universality. We provide a quantitative constrained universal approximation theorem which guarantees that for any non-convex compact set KK and any continuous function f:RnKf:\mathbb{R}^n\rightarrow K, there is a probabilistic transformer F^\hat{F} whose randomized outputs all lie in KK and whose expected output uniformly approximates ff. Our second main result is a "deep neural version" of Berge's Maximum Theorem (1963). The result guarantees that given an objective function LL, a constraint set KK, and a family of soft constraint sets, there is a probabilistic transformer F^\hat{F} that approximately minimizes LL and whose outputs belong to KK; moreover, F^\hat{F} approximately satisfies the soft constraints. Our results imply the first universal approximation theorem for classical transformers with exact convex constraint satisfaction. They also yield that a chart-free universal approximation theorem for Riemannian manifold-valued functions subject to suitable geodesically convex constraints.

View on arXiv
Comments on this paper