15
7

Limit Distribution for Smooth Total Variation and χ2χ^2-Divergence in High Dimensions

Abstract

Statistical divergences are ubiquitous in machine learning as tools for measuring discrepancy between probability distributions. As these applications inherently rely on approximating distributions from samples, we consider empirical approximation under two popular ff-divergences: the total variation (TV) distance and the χ2\chi^2-divergence. To circumvent the sensitivity of these divergences to support mismatch, the framework of Gaussian smoothing is adopted. We study the limit distributions of nδTV(PnN,PN)\sqrt{n}\delta_{\mathsf{TV}}(P_n\ast\mathcal{N},P\ast\mathcal{N}) and nχ2(PnNPN)n\chi^2(P_n\ast\mathcal{N}\|P\ast\mathcal{N}), where PnP_n is the empirical measure based on nn independently and identically distributed (i.i.d.) observations from PP, Nσ:=N(0,σ2Id)\mathcal{N}_\sigma:=\mathcal{N}(0,\sigma^2\mathrm{I}_d), and \ast stands for convolution. In arbitrary dimension, the limit distributions are characterized in terms of Gaussian process on Rd\mathbb{R}^d with covariance operator that depends on PP and the isotropic Gaussian density of parameter σ\sigma. This, in turn, implies optimality of the n1/2n^{-1/2} expected value convergence rates recently derived for δTV(PnN,PN)\delta_{\mathsf{TV}}(P_n\ast\mathcal{N},P\ast\mathcal{N}) and χ2(PnNPN)\chi^2(P_n\ast\mathcal{N}\|P\ast\mathcal{N}). These strong statistical guarantees promote empirical approximation under Gaussian smoothing as a potent framework for learning and inference based on high-dimensional data.

View on arXiv
Comments on this paper