Achieving Differential Privacy with Matrix Masking in Big Data

11 January 2022

Abstract

Differential privacy schemes have been widely adopted in recent years to address issues of data privacy protection. We propose a new Gaussian scheme combining with another data protection technique, called random orthogonal matrix masking, to achieve $(\varepsilon, \delta)$ -differential privacy (DP) more efficiently. We prove that the additional matrix masking significantly reduces the rate of noise variance required in the Gaussian scheme to achieve $(\varepsilon, \delta)-$ DP in big data setting. Specifically, when $\varepsilon \to 0$ , $\delta \to 0$ , and the sample size $n$ exceeds the number $p$ of attributes by $\frac{n}{p}=O(ln(1/\delta))$ , the required additive noise variance to achieve $(\varepsilon, \delta)$ -DP is reduced from $O(ln(1/\delta)/\varepsilon^2)$ to $O(1/\varepsilon)$ . With much less noise added, the resulting differential privacy protected pseudo data sets allow much more accurate inferences, thus can significantly improve the scope of application for differential privacy.

View on arXiv

Comments on this paper