A mixture of contaminated Gaussian distributions is developed for robust mixture model-based clustering. In addition to the usual parameters, each component of our contaminated mixture has a parameter controlling the proportion of outliers, spurious points, or noise (collectively referred to as bad points herein) and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach that is not present in other robust clustering techniques such as trimmed clustering. Moreover, our contaminated approach can both identify bad points and be applied in higher dimensions. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. The performance of this novel family of models is illustrated on artificial and real data.
View on arXiv