Universal Regular Conditional Distributions via Probability Measure-Valued Deep Neural Models

Constructive approximation (Constr. Approx.), 2021

17 May 2021

Abstract

This paper introduces a general framework for explicitly constructing universal deep neural models with inputs from a complete, separable, and locally-compact metric space $\mathcal{X}$ and outputs in the Wasserstein-1 $\mathcal{P}_1(\mathcal{Y})$ space over a complete and separable metric space $\mathcal{Y}$ . We find that any model built using the proposed framework is dense in the space $C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$ of continuous functions from $\mathcal{X}$ to $\mathcal{P}_1(\mathcal{Y})$ in the corresponding uniform convergence on compacts topology, quantitatively. We identify two methods in which the curse of dimensionality can be broken. The first approach constructs subsets of $C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$ consisting of functions that can be efficiently approximated. In the second approach, given any fixed $f \in C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$ , we build non-trivial subsets of $\mathcal{X}$ on which $f$ can be efficiently approximated. The results are applied to three open problems lying at the interface of applied probability and computational learning theory. We find that the proposed models can approximate any regular conditional distribution of a $\mathcal{Y}$ -valued random element $Y$ depending on an $\mathcal{X}$ -valued random element $X$ , with arbitrarily high probability. The proposed models are also shown to be capable of generically expressing the aleatoric uncertainty present in most randomized machine learning models. The proposed framework is used to derive an affirmative answer to the open conjecture of Bishop (1994); namely: mixture density networks are generic regular conditional distributions. Numerical experiments are performed in the contexts of extreme learning machines, randomized DNNs, and heteroscedastic regression.

View on arXiv

Comments on this paper