Do Semidefinite Relaxations Really Solve Sparse PCA?

Estimating the leading principal components of data assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were suggested for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question asks under what conditions can such algorithms recover the sparse principal components. We study this question for a single-spike model, with a spike that is -sparse, and dimension and sample size that tend to infinity. Amini and Wainwright (2009) proved that for sparsity levels , no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for sparsity levels , diagonal thresholding is asymptotically consistent. It was further conjectured that the SDP approach may close this gap between computational and information limits. We prove that when the SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of -sparsity . Finally, we present empirical results suggesting that up to sparsity levels , recovery is possible by a simple covariance thresholding algorithm.
View on arXiv