183

Convex programming approach to robust estimation of a multivariate Gaussian model

Abstract

Multivariate Gaussian distribution is often used as a first approximation to the distribution of high-dimensional data. Determining the parameters of this distribution under various constraints is a widely studied problem in statistics, and is often considered as a prototype for testing new algorithms or theoretical frameworks. In this paper, we develop a nonasymptotic approach to the problem of estimating the parameters of a multivariate Gaussian distribution when data are corrupted by outliers. We propose an estimator-efficiently computable by solving a convex program-that robustly estimates the population mean and the population covariance matrix even when the sample contains a significant proportion of outliers. In the case where the dimension pp of the data points is of smaller order than the sample size, our estimator of the corruption matrix is provably rate optimal simultaneously for the entry-wise l1l_1-norm, the Frobenius norm and the mixed l2/l1l_2/l_1 norm. Furthermore, this optimality is achieved by a penalized square-root-of-least-squares method with a universal tuning parameter (calibrating the strength of the penalization). These results are partly extended to the case where pp is potentially larger than nn, under the additional condition that the inverse covariance matrix is sparse.

View on arXiv
Comments on this paper