Optimal Coreset for Gaussian Kernel Density Estimation

Abstract
Given a point set , the kernel density estimate of is defined as \[ \overline{\mathcal{G}}_P(x) = \frac{1}{\left|P\right|}\sum_{p\in P}e^{-\left\lVert x-p \right\rVert^2} \] for any . We study how to construct a small subset of such that the kernel density estimate of is approximated by the kernel density estimate of . This subset is called a coreset. The main technique in this work is constructing a coloring on the point set by discrepancy theory and we leverage Banaszczyk's Theorem. When is a constant, our construction gives a coreset of size as opposed to the best-known result of . It is the first result to give a breakthrough on the barrier of factor even when .
View on arXivComments on this paper