One-Pass Sparsified Gaussian Mixtures

10 March 2019

Abstract

We present a one-pass sparsified Gaussian mixture model (SGMM). Given $P$ -dimensional datapoints $X = \{\mathbf{x}_i\}_{i=1}^N$ , the model fits $K$ Gaussian distributions to $X$ and (softly) classifies each xi to these clusters. After paying an up-front cost of $\mathcal{O}(NP\log P)$ to precondition the data, we subsample $Q$ entries of each datapoint and discard the full $P$ -dimensional data. SGMM operates in $\mathcal{O}(KNQ)$ time per iteration for diagonal or spherical covariances, independent of $P$ , while estimating the model parameters $\theta$ in the full $P$ -dimensional space, making it one-pass and hence suitable for streaming data. We derive the maximum likelihood estimators for $\theta$ in the sparsified regime, demonstrate clustering on synthetic and real data, and show that SGMM is faster than GMM while preserving accuracy.

View on arXiv

Comments on this paper