20
4

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

Abstract

This paper is concerned with estimating the column subspace of a low-rank matrix XRn1×n2\boldsymbol{X}^\star \in \mathbb{R}^{n_1\times n_2} from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., n2n1n_2\gg n_1). While the state-of-the-art algorithm HeteroPCA\textsf{HeteroPCA} emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of X\boldsymbol{X}^\star grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called Deflated-HeteroPCA\textsf{Deflated-HeteroPCA}, that achieves near-optimal and condition-number-free theoretical guarantees in terms of both 2\ell_2 and 2,\ell_{2,\infty} statistical accuracy. The proposed algorithm divides the spectrum of X\boldsymbol{X}^\star into well-conditioned and mutually well-separated subblocks, and applies HeteroPCA\textsf{HeteroPCA} to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.

View on arXiv
Comments on this paper