167
152
v1v2v3 (latest)

On the principal components of sample covariance matrices

Abstract

We introduce a class of M×MM \times M sample covariance matrices Q\mathcal Q which subsumes and generalizes several previous models. The associated population covariance matrix Σ=EQ\Sigma = \mathbb E \cal Q is assumed to differ from the identity by a matrix of bounded rank. All quantities except the rank of ΣIM\Sigma - I_M may depend on MM in an arbitrary fashion. We investigate the principal components, i.e.\ the top eigenvalues and eigenvectors, of Q\mathcal Q. We derive precise large deviation estimates on the generalized components w,ξi\langle \mathbf w, \boldsymbol \xi_i \rangle of the outlier and non-outlier eigenvectors ξi\boldsymbol \xi_i. Our results also hold near the so-called BBP transition, where outliers are created or annihilated, and for degenerate or near-degenerate outliers. We believe the obtained rates of convergence to be optimal. In addition, we derive the asymptotic distribution of the generalized components of the non-outlier eigenvectors. A novel observation arising from our results is that, unlike the eigenvalues, the eigenvectors of the principal components contain information about the \emph{subcritical} spikes of Σ\Sigma. The proofs use several results on the eigenvalues and eigenvectors of the uncorrelated matrix Q\mathcal Q, satisfying EQ=IM\mathbb E \mathcal Q = I_M, as input: the isotropic local Marchenko-Pastur law established in [9], level repulsion, and quantum unique ergodicity of the eigenvectors. The latter is a special case of a new universality result for the joint eigenvalue-eigenvector distribution.

View on arXiv
Comments on this paper