90
112

Robust Clustering via Parsimonious Mixtures of Contaminated Gaussian Distributions

Abstract

A mixture of contaminated Gaussian distributions is developed for robust mixture model-based clustering. In addition to the usual parameters, each component of our contaminated mixture has a parameter controlling the proportion of outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach that is not present in other robust clustering techniques such as trimmed clustering. Accordingly, our contaminated approach can be applied to higher dimensional data, making it the only robust clustering method that can both identify outliers and be applied in higher dimensions. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. The performance of this novel family of models is illustrated on artificial and real data.

View on arXiv
Comments on this paper