Kernel Thinning

We introduce kernel thinning, a new procedure for compressing a distribution more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel and time, kernel thinning compresses an -point approximation to into a -point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is in probability for compactly supported and for sub-exponential on . In contrast, an equal-sized i.i.d. sample from suffers integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform on but apply to general distributions on and a wide range of common kernels. Moreover, the same construction delivers near-optimal coresets in time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\érn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions through .
View on arXiv