Asymptotic distribution of principal component scores for pervasive, high-dimensional eigenvectors

Plots of scores from principal component analysis are a popular approach to visualize and explore high-dimensional genetic data. However, the inconsistency of the high-dimensional eigenvectors has discredited classical principal component analysis and helped motivate sparse principal component analysis where the eigenvectors are regularized. Still, classical principal component analysis is extensively and successfully used for data visualization, and our aim is to give an explanation of this paradoxical situation. We show that the visual information given by the relative positions of the scores will be consistent, if the related signal can be considered to be pervasive. Firstly, we argue that pervasive signals lead to eigenvalues scaling linearly with the dimension, and we discuss genetic applications where such pervasive signals are reasonable. Secondly, we prove within the high-dimension low sample size regime, that when eigenvalues scale linearly with the dimension, the sample component scores will appear as scaled and rotated versions of the population scores. In consequence, the relative positions and visual information conveyed by the score plots will be consistent.
View on arXiv