439

Sub-Gaussian Mean Estimation in Polynomial Time

Abstract

We study polynomial time algorithms for estimating the mean of a random vector XX in Rd\mathbb{R}^d from nn independent samples X1,,XnX_1,\ldots,X_n when XX may be heavy-tailed. We assume only that XX has finite mean μ\mu and covariance Σ\Sigma. In this setting, the radius of confidence intervals achieved by the empirical mean are large compared to the case that XX is Gaussian or sub-Gaussian. In particular, for confidence δ>0\delta > 0, the empirical mean has confidence intervals with radius of order TrΣ/δn\sqrt{\text{Tr} \Sigma / \delta n} rather than TrΣ/n+λmax(Σ)log(1/δ)/n\sqrt{\text{Tr} \Sigma /n } + \sqrt{ \lambda_{\max}(\Sigma) \log (1/\delta) / n} from the Gaussian case. We offer the first polynomial time algorithm to estimate the mean with sub-Gaussian confidence intervals under such mild assumptions. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of O(1)O(1) moments of XX either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring exp(d)\exp(d) time.

View on arXiv
Comments on this paper