617

Optimal Sparse Principal Component Analysis in High Dimensional Elliptical Model

Abstract

We propose a semiparametric sparse principal component analysis method named elliptical component analysis (ECA) for analyzing high dimensional non-Gaussian data. In particular, we assume the data follow an elliptical distribution. Elliptical family contains many well-known multivariate distributions such as multivariate Gaussian, multivariate-tt, Cauchy, Kotz, and logistic distributions. It allows extra flexibility on modeling heavy-tailed distributions and capture tail dependence between variables. Such modeling flexibility makes it extremely useful in modeling financial, genomics and bioimaging data, where the data typically present heavy tails and high tail dependence. Under a double asymptotic framework where both the sample size nn and the dimension dd increase, we show that a multivariate rank based ECA procedure attains the optimal rate of convergence in parameter estimation. This is the first optimality result established for sparse principal component analysis on high dimensional elliptical data.

View on arXiv
Comments on this paper