38
21

DP-PCA: Statistically Optimal and Differentially Private PCA

Abstract

We study the canonical statistical task of computing the principal component from nn i.i.d.~data in dd dimensions under (ε,δ)(\varepsilon,\delta)-differential privacy. Although extensively studied in literature, existing solutions fall short on two key aspects: (ii) even for Gaussian data, existing private algorithms require the number of samples nn to scale super-linearly with dd, i.e., n=Ω(d3/2)n=\Omega(d^{3/2}), to obtain non-trivial results while non-private PCA requires only n=O(d)n=O(d), and (iiii) existing techniques suffer from a non-vanishing error even when the randomness in each data point is arbitrarily small. We propose DP-PCA, which is a single-pass algorithm that overcomes both limitations. It is based on a private minibatch gradient ascent method that relies on {\em private mean estimation}, which adds minimal noise required to ensure privacy by adapting to the variance of a given minibatch of gradients. For sub-Gaussian data, we provide nearly optimal statistical error rates even for n=O~(d)n=\tilde O(d). Furthermore, we provide a lower bound showing that sub-Gaussian style assumption is necessary in obtaining the optimal error rate.

View on arXiv
Comments on this paper