Chebyshev polynomials, moment matching, and optimal estimation of the unseen

We consider the problem of estimating the support size of a discrete distribution whose minimum non-zero mass is at least . Under the independent sampling model, we show that the minimax sample complexity to achieve an additive error of with probability at least 0.5 is within universal constant factors of which improves the state-of-the-art result due to Valiant and Valiant. The optimal procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties. We also study the closely related species problem where the goal is to estimate the number of distinct colors in an urn containing balls from repeated draws. While achieving an additive error proportional to still requires samples, we show that with samples one can strictly outperform a general support size estimator using interpolating polynomials.
View on arXiv