On the Structure, Covering, and Learning of Poisson Multinomial Distributions

30 April 2015

Abstract

An $(n,k)$ -Poisson Multinomial Distribution (PMD) is the distribution of the sum of $n$ independent random vectors supported on the set ${\cal B}_k=\{e_1,\ldots,e_k\}$ of standard basis vectors in $\mathbb{R}^k$ . We prove a structural characterization of these distributions, showing that, for all $\varepsilon >0$ , any $(n, k)$ -Poisson multinomial random vector is $\varepsilon$ -close, in total variation distance, to the sum of a discretized multidimensional Gaussian and an independent $(\text{poly}(k/\varepsilon), k)$ -Poisson multinomial random vector. Our structural characterization extends the multi-dimensional CLT of Valiant and Valiant, by simultaneously applying to all approximation requirements $\varepsilon$ . In particular, it overcomes factors depending on $\log n$ and, importantly, the minimum eigenvalue of the PMD's covariance matrix from the distance to a multidimensional Gaussian random variable. We use our structural characterization to obtain an $\varepsilon$ -cover, in total variation distance, of the set of all $(n, k)$ -PMDs, significantly improving the cover size of Daskalakis and Papadimitriou, and obtaining the same qualitative dependence of the cover size on $n$ and $\varepsilon$ as the $k=2$ cover of Daskalakis and Papadimitriou. We further exploit this structure to show that $(n,k)$ -PMDs can be learned to within $\varepsilon$ in total variation distance from $\tilde{O}_k(1/\varepsilon^2)$ samples, which is near-optimal in terms of dependence on $\varepsilon$ and independent of $n$ . In particular, our result generalizes the single-dimensional result of Daskalakis, Diakonikolas, and Servedio for Poisson Binomials to arbitrary dimension.

View on arXiv

Comments on this paper