Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

This paper is concerned with estimating the column subspace of a low-rank matrix from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., ). While the state-of-the-art algorithm emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called , that achieves near-optimal and condition-number-free theoretical guarantees in terms of both and statistical accuracy. The proposed algorithm divides the spectrum of into well-conditioned and mutually well-separated subblocks, and applies to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.
View on arXiv