Consistent, Two-Stage Sampled Distribution Regression via Mean Embedding

7 February 2014

Z. Szabó

Arthur Gretton

Barnabás Póczós

Bharath K. Sriperumbudur

OOD

ArXiv (abs)PDF HTML

Abstract

We study the distribution regression problem: regressing to a real-valued response from a probability distribution. Due to the inherent two-stage sampled difficulty of this important machine learning problem---in practise we only have samples from sampled distributions---very little is known about its theoretical properties. In this paper, we propose an algorithmically simple approach to tackle the distribution regression problem: embed the distributions to a reproducing kernel Hilbert space, and learn a ridge regressor from the embeddings to the outputs. Our main contribution is to prove that this technique is consistent in the two-stage sampled setting under fairly mild conditions (for probability distributions on Polish, locally compact Haussdorf spaces on which kernels have been defined). The method gives state-of-the-art results on (i) supervised entropy learning and (ii) the prediction problem of aerosol optical depth based on satellite images.

View on arXiv

Comments on this paper