Data-dependent Confidence Regions of Singular Subspaces

2 January 2019

Abstract

Matrix singular value decomposition (SVD) is popular in statistical data analysis which shows superior efficiency of extracting unknown low-rank singular subspaces embedded in noisy matrix observations. This paper is on the statistical inference for singular subspaces when the noise matrix has i.i.d. entries and our goal is to construct data-dependent confidence regions of the unknown singular subspaces. Our contributions are three-fold. First, we derive an explicit representation formula for the empirical spectral projectors. The formula is neat and holds for deterministic matrix perturbations. Second, we prove the non-asymptotical normal approximation of the projection distance with different levels of bias corrections. By the $\log(d_1+d_2)$ -th order bias corrections, the asymptotical normality holds under signal-to-noise ration (SNR) requirement $O\big((d_1+d_2)^{1/2}\big)$ where $d_1$ and $d_2$ denote the matrix sizes.Third, we propose a shrinkage estimator of the singular values based on recent results from random matrix theory. Based on these estimators, we propose data-dependent centering and normalization factors for the projection distance, where the asymptotical normality is proved under SNR $O\big((d_1+d_2)^{1/2}\big)$ . Finally, we provide comprehensive simulation results to merit our theoretic discoveries. Unlike the existing results, our approach is non-asymptotical and the convergence rates are established. Our method allows the rank $r$ to diverge as fast as $O((d_1+d_2)^{1/3})$ . Finally, our method allows the singular values to be all equal, i.e., no eigen-gap condition (except the SNR) is required.

View on arXiv

Comments on this paper