476

Learning Poisson Binomial Distributions

Algorithmica (Algorithmica), 2011
Abstract

We consider a basic problem in unsupervised learning: learning an unknown \emph{Poisson Binomial Distribution} over {0,1,...,n}\{0,1,...,n\}. A Poisson Binomial Distribution (PBD) is a sum X=X1+...+XnX = X_1 + ... + X_n of nn independent Bernoulli random variables which may have arbitrary expectations. We work in a framework where the learner is given access to independent draws from the distribution and must (with high probability) output a hypothesis distribution which has total variation distance at most \eps\eps from the unknown target PBD. As our main result we give a highly efficient algorithm which learns to \eps\eps-accuracy using O~(1/\eps3)\tilde{O}(1/\eps^3) samples independent of nn. The running time of the algorithm is \emph{quasilinear} in the size of its input data, i.e. O~(log(n)/\eps3)\tilde{O}(\log(n)/\eps^3) bit-operations (observe that each draw from the distribution is a log(n)\log(n)-bit string). This is nearly optimal since any algorithm must use Ω(1/\eps2)\Omega(1/\eps^2) samples. We also give positive and negative results for some extensions of this learning problem.

View on arXiv
Comments on this paper