365

Supervised learning with probabilistic morphisms and kernel mean embeddings

Abstract

In this paper I propose a concept of a correct loss function in a generative model of supervised learning for an input space X\mathcal{X} and a label space Y\mathcal{Y}, which are measurable spaces. A correct loss function in a generative model of supervised learning must correctly measure the discrepancy between elements of a hypothesis space H\mathcal{H} of possible predictors and the supervisor operator, which may not belong to H\mathcal{H}. To define correct loss functions, I propose a characterization of a regular conditional probability measure μYX\mu_{\mathcal{Y}|\mathcal{X}} for a probability measure μ\mu on X×Y\mathcal{X} \times \mathcal{Y} relative to the projection ΠX:X×YX\Pi_{\mathcal{X}}: \mathcal{X}\times\mathcal{Y}\to \mathcal{X} as a solution of a linear operator equation. If Y\mathcal{Y} is a separable metrizable topological space with the Borel σ\sigma-algebra $ \mathcal{B} (\mathcal{Y})$, I propose another characterization of a regular conditional probability measure μYX\mu_{\mathcal{Y}|\mathcal{X}} as a minimizer of a mean square error on the space of Markov kernels, called probabilistic morphisms, from X\mathcal{X} to Y\mathcal{Y}, using kernel mean embedding. Using these results and using inner measure to quantify generalizability of a learning algorithm, I give a generalization of a result due to Cucker-Smale, which concerns the learnability of a regression model, to a setting of a conditional probability estimation problem. I also give a variant of Vapnik's method of solving stochastic ill-posed problem, using inner measure and discuss its applications.

View on arXiv
Comments on this paper