68
25

Sample complexity of the distinct elements problem

Pengkun Yang
Abstract

We consider the distinct elements problem, where the goal is to estimate the number of distinct colors in an urn containing k k balls from repeated draws. We propose an estimator, based on sampling without replacement, with additive error guarantee. The sample complexity is optimal within O(loglogk)O(\log\log k) factors, and in fact within constant factors for most accuracy parameters. The optimal sample complexity is also applicable to sampling without replacement provided the sample size is a vanishing fraction of the urn size. One of the key auxiliary results is a sharp bound on the minimum singular values of a real rectangular Vandermonde matrix, which might be of independent interest.

View on arXiv
Comments on this paper