Universal Approximation Under Constraints is Possible with Transformers

Many practical problems need the output of a machine learning model to satisfy a set of constraints, . Nevertheless, there is no known guarantee that classical neural network architectures can exactly encode constraints while simultaneously achieving universality. We provide a quantitative constrained universal approximation theorem which guarantees that for any non-convex compact set and any continuous function , there is a probabilistic transformer whose randomized outputs all lie in and whose expected output uniformly approximates . Our second main result is a "deep neural version" of Berge's Maximum Theorem (1963). The result guarantees that given an objective function , a constraint set , and a family of soft constraint sets, there is a probabilistic transformer that approximately minimizes and whose outputs belong to ; moreover, approximately satisfies the soft constraints. Our results imply the first universal approximation theorem for classical transformers with exact convex constraint satisfaction. They also yield that a chart-free universal approximation theorem for Riemannian manifold-valued functions subject to suitable geodesically convex constraints.
View on arXiv