391
1

Mixture Decomposition of Distributions using a Decomposition of the Sample Space

Abstract

We consider the set of join probability distributions of NN binary random variables which can be written as a sum of mm distributions in the following form p(x1,,xN)=i=1mαifi(x1,,xN)p(x_1,\ldots,x_N)=\sum_{i=1}^m \alpha_i f_i(x_1,\ldots,x_N), where αi0\alpha_i \geq 0, i=1mαi=1\sum_{i=1}^m \alpha_i =1, and the fi(x1,,xN)f_i(x_1,\ldots,x_N) belong to some exponential family. For our analysis we decompose the sample space into portions on which the mixture components fif_i can be chosen arbitrarily. We derive lower bounds on the number of mixture components from a given exponential family necessary to represent distributions with arbitrary correlations up to a certain order or to represent any distribution. For instance, in the case where fif_i are independent distributions we show that every distribution pp on {0,1}N\{0,1\}^N is contained in the mixture model whenever m2N1m\geq 2^{N-1}, and furthermore, that there are distributions which are not contained in the mixture model whenever m<2N1m<2^{N-1}.

View on arXiv
Comments on this paper