Learning Poisson Binomial Distributions
- SSL
We consider a basic problem in unsupervised learning: learning an unknown \emph{Poisson Binomial Distribution} over . A Poisson Binomial Distribution (PBD) is a sum of independent Bernoulli random variables which may have arbitrary expectations. We work in a framework where the learner is given access to independent draws from the distribution and must (with high probability) output a hypothesis distribution which has total variation distance at most from the unknown target PBD. As our main result we give a highly efficient algorithm which learns to -accuracy using samples independent of . The running time of the algorithm is \emph{quasilinear} in the size of its input data, i.e. bit-operations (observe that each draw from the distribution is a -bit string). This is nearly optimal since any algorithm must use samples. We also give positive and negative results for some extensions of this learning problem.
View on arXiv