Optimal Sparse Principal Component Analysis in High Dimensional
Elliptical Model
We propose a semiparametric sparse principal component analysis method named elliptical component analysis (ECA) for analyzing high dimensional non-Gaussian data. In particular, we assume the data follow an elliptical distribution. Elliptical family contains many well-known multivariate distributions such as multivariate Gaussian, multivariate-, Cauchy, Kotz, and logistic distributions. It allows extra flexibility on modeling heavy-tailed distributions and capture tail dependence between variables. Such modeling flexibility makes it extremely useful in modeling financial, genomics and bioimaging data, where the data typically present heavy tails and high tail dependence. Under a double asymptotic framework where both the sample size and the dimension increase, we show that a multivariate rank based ECA procedure attains the optimal rate of convergence in parameter estimation. This is the first optimality result established for sparse principal component analysis on high dimensional elliptical data.
View on arXiv