108
42

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Abstract

We develop two methods for the following fundamental statistical task: given an ϵ\epsilon-corrupted set of nn samples from a dd-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a 1O(ϵlogϵ1)1 - O(\epsilon\log\epsilon^{-1})-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-pp norm packing semidefinite programs, giving a (1+ϵ)(1 + \epsilon)-approximate solution in O(plog(ndϵ)ϵ1)O(p\log(\tfrac{nd}{\epsilon})\epsilon^{-1}) input-sparsity time iterations (where nn, dd are problem dimensions).

View on arXiv
Comments on this paper