221

From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical Approximation and Statistical Applications

International Conference on Machine Learning (ICML), 2021
Abstract

Statistical distances, i.e., discrepancy measures between probability distributions, are ubiquitous in probability theory, statistics and machine learning. To combat the curse of dimensionality when estimating these distances from data, recent work has proposed smoothing out local irregularities in the measured distributions via convolution with a Gaussian kernel. Motivated by the scalability of the smooth framework to high dimensions, we conduct an in-depth study of the structural and statistical behavior of the Gaussian-smoothed pp-Wasserstein distance Wp(σ)\mathsf{W}_p^{(\sigma)}, for arbitrary p1p\geq 1. We start by showing that Wp(σ)\mathsf{W}_p^{(\sigma)} admits a metric structure that is topologically equivalent to classic Wp\mathsf{W}_p and is stable with respect to perturbations in σ\sigma. Moving to statistical questions, we explore the asymptotic properties of Wp(σ)(μ^n,μ)\mathsf{W}_p^{(\sigma)}(\hat{\mu}_n,\mu), where μ^n\hat{\mu}_n is the empirical distribution of nn i.i.d. samples from μ\mu. To that end, we prove that Wp(σ)\mathsf{W}_p^{(\sigma)} is controlled by a ppth order smooth dual Sobolev norm dp(σ)\mathsf{d}_p^{(\sigma)}. Since dp(σ)(μ^n,μ)\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu) coincides with the supremum of an empirical process indexed by Gaussian-smoothed Sobolev functions, it lends itself well to analysis via empirical process theory. We derive the limit distribution of ndp(σ)(μ^n,μ)\sqrt{n}\mathsf{d}_p^{(\sigma)}(\hat{\mu}_n,\mu) in all dimensions dd, when μ\mu is sub-Gaussian. Through the aforementioned bound, this implies a parametric empirical convergence rate of n1/2n^{-1/2} for Wp(σ)\mathsf{W}_p^{(\sigma)}, contrasting the n1/dn^{-1/d} rate for unsmoothed Wp\mathsf{W}_p when d3d \geq 3. As applications, we provide asymptotic guarantees for two-sample testing and minimum distance estimation. When p=2p=2, we further show that d2(σ)\mathsf{d}_2^{(\sigma)} can be expressed as a maximum mean discrepancy.

View on arXiv
Comments on this paper