ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.13576
19
64

Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

30 May 2019
Ziv Goldfeld
Kristjan Greenewald
Yury Polyanskiy
Jonathan Niles-Weed
ArXivPDFHTML
Abstract

This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating P∗NσP\ast\mathcal{N}_\sigmaP∗Nσ​, for Nσ≜N(0,σ2Id)\mathcal{N}_\sigma\triangleq\mathcal{N}(0,\sigma^2 \mathrm{I}_d)Nσ​≜N(0,σ2Id​), by P^n∗Nσ\hat{P}_n\ast\mathcal{N}_\sigmaP^n​∗Nσ​, where P^n\hat{P}_nP^n​ is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and χ2\chi^2χ2-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (W1\mathsf{W}_1W1​) converges at rate eO(d)n−12e^{O(d)}n^{-\frac{1}{2}}eO(d)n−21​ in remarkable contrast to a typical n−1dn^{-\frac{1}{d}}n−d1​ rate for unsmoothed W1\mathsf{W}_1W1​ (and d≥3d\ge 3d≥3). For the KL divergence, squared 2-Wasserstein distance (W22\mathsf{W}_2^2W22​), and χ2\chi^2χ2-divergence, the convergence rate is eO(d)n−1e^{O(d)}n^{-1}eO(d)n−1, but only if PPP achieves finite input-output χ2\chi^2χ2 mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to ω(n−1)\omega(n^{-1})ω(n−1) for the KL divergence and W22\mathsf{W}_2^2W22​, while the χ2\chi^2χ2-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy h(P∗Nσ)h(P\ast\mathcal{N}_\sigma)h(P∗Nσ​) in the high-dimensional regime. The distribution PPP is unknown but nnn i.i.d samples from it are available. We first show that any good estimator of h(P∗Nσ)h(P\ast\mathcal{N}_\sigma)h(P∗Nσ​) must have sample complexity that is exponential in ddd. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate eO(d)n−12e^{O(d)}n^{-\frac{1}{2}}eO(d)n−21​, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.

View on arXiv
Comments on this paper