A Generalized Mean Approach for Distributed-PCA

Journal of Computational And Graphical Statistics (JCGS), 2024

1 October 2024

Main:15 Pages

2 Figures

Bibliography:1 Pages

1 Tables

Appendix:1 Pages

Abstract

Principal component analysis (PCA) is a widely used technique for dimension reduction. As datasets continue to grow in size, distributed-PCA (DPCA) has become an active research area. A key challenge in DPCA lies in efficiently aggregating results across multiple machines or computing nodes due to computational overhead. Fan et al. (2019) introduced a pioneering DPCA method to estimate the leading rank- $r$ eigenspace, aggregating local rank- $r$ projection matrices by averaging. However, their method does not utilize eigenvalue information. In this article, we propose a novel DPCA method that incorporates eigenvalue information to aggregate local results via the matrix $\beta$ -mean, which we call $\beta$ -DPCA. The matrix $\beta$ -mean offers a flexible and robust aggregation method through the adjustable choice of $\beta$ values. Notably, for $\beta=1$ , it corresponds to the arithmetic mean; for $\beta=-1$ , the harmonic mean; and as $\beta \to 0$ , the geometric mean. Moreover, the matrix $\beta$ -mean is shown to associate with the matrix $\beta$ -divergence, a subclass of the Bregman matrix divergence, to support the robustness of $\beta$ -DPCA. We also study the stability of eigenvector ordering under eigenvalue perturbation for $\beta$ -DPCA. The performance of our proposal is evaluated through numerical studies.

View on arXiv

Comments on this paper