Learning Poisson Binomial Distributions

Algorithmica (Algorithmica), 2011

13 July 2011

Abstract

We consider a basic problem in unsupervised learning: learning an unknown \emph{Poisson Binomial Distribution} over $\{0,1,...,n\}$ . A Poisson Binomial Distribution (PBD) is a sum $X = X_1 + ... + X_n$ of $n$ independent Bernoulli random variables which may have arbitrary expectations. We work in a framework where the learner is given access to independent draws from the distribution and must (with high probability) output a hypothesis distribution which has total variation distance at most $\eps$ from the unknown target PBD. As our main result we give a highly efficient algorithm which learns to $\eps$ -accuracy using $\tilde{O}(1/\eps^3)$ samples independent of $n$ . The running time of the algorithm is \emph{quasilinear} in the size of its input data, i.e. $\tilde{O}(\log(n)/\eps^3)$ bit-operations (observe that each draw from the distribution is a $\log(n)$ -bit string). This is nearly optimal since any algorithm must use $\Omega(1/\eps^2)$ samples. We also give positive and negative results for some extensions of this learning problem.

View on arXiv

Comments on this paper