Data Structures for Density Estimation

20 June 2023

Abstract

We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$ , and sampling access to a distribution $p$ , identify $v_i$ that is "close" to $p$ . Our main result is the first data structure that, given a sublinear (in $n$ ) number of samples from $p$ , identifies $v_i$ in time sublinear in $k$ . We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$ . The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.

View on arXiv

Comments on this paper