Data Structures for Density Estimation

Abstract
We study statistical/computational tradeoffs for the following density estimation problem: given distributions over a discrete domain of size , and sampling access to a distribution , identify that is "close" to . Our main result is the first data structure that, given a sublinear (in ) number of samples from , identifies in time sublinear in . We also give an improved version of the algorithm of Acharya et al. (2018) that reports in time linear in . The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.
View on arXivComments on this paper