13
29

Kernel Thinning

Abstract

We introduce kernel thinning, a new procedure for compressing a distribution P\mathbb{P} more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel k\mathbf{k}_{\star} and O(n2)O(n^2) time, kernel thinning compresses an nn-point approximation to P\mathbb{P} into a n\sqrt{n}-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is Od(n1/2logn)O_d(n^{-1/2}\sqrt{\log n}) in probability for compactly supported P\mathbb{P} and Od(n12(logn)(d+1)/2loglogn)O_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n}) for sub-exponential P\mathbb{P} on Rd\mathbb{R}^d. In contrast, an equal-sized i.i.d. sample from P\mathbb{P} suffers Ω(n1/4)\Omega(n^{-1/4}) integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform P\mathbb{P} on [0,1]d[0,1]^d but apply to general distributions on Rd\mathbb{R}^d and a wide range of common kernels. Moreover, the same construction delivers near-optimal LL^\infty coresets in O(n2)O(n^2) time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\érn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions d=2d=2 through 100100.

View on arXiv
Comments on this paper