A mixture of contaminated Gaussian distributions is developed for model-based clustering. In addition to the parameters of the classical Gaussian mixture, each component of our contaminated mixture has a parameter controlling the proportion of outliers, spurious points, or noise (collectively referred to as bad points herein) and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. Using a large scale simulation study, we investigate the behavior of the proposed approach and we provide a comparison with finite mixture models of some well-established multivariate elliptical distributions. The performance of this novel family of models is also illustrated on artificial and real data, with particular emphasis to the application in allometric studies.
View on arXiv