16
0

Universal consistency of Wasserstein kk-NN classifier: Negative and Positive Results

Abstract

The Wasserstein distance provides a notion of dissimilarities between probability measures, which has recent applications in learning of structured data with varying size such as images and text documents. In this work, we study the kk-nearest neighbor classifier (kk-NN) of probability measures under the Wasserstein distance. We show that the kk-NN classifier is not universally consistent on the space of measures supported in (0,1)(0,1). As any Euclidean ball contains a copy of (0,1)(0,1), one should not expect to obtain universal consistency without some restriction on the base metric space, or the Wasserstein space itself. To this end, via the notion of σ\sigma-finite metric dimension, we show that the kk-NN classifier is universally consistent on spaces of measures supported in a σ\sigma-uniformly discrete set. In addition, by studying the geodesic structures of the Wasserstein spaces for p=1p=1 and p=2p=2, we show that the kk-NN classifier is universally consistent on the space of measures supported on a finite set, the space of Gaussian measures, and the space of measures with densities expressed as finite wavelet series.

View on arXiv
Comments on this paper