Differentially Private Hypothesis Testing, Revisited

11 November 2015

Abstract

How should one statistically analyze privacy-enhanced data? In theory, one could process it exactly as if it were normal data since many differentially private algorithms asymptotically converge exponentially fast to their non-private counterparts and/or have error that asymptotically decreases as fast as sampling error. In practice, convergence often requires enormous amounts of data. Thus making differential privacy practical requires the development of techniques that specifically account for the noise that is added for the sake of providing privacy guarantees. Such techniques are especially needed for statistical hypothesis testing. Previous approaches either ignored the added noise (resulting in highly biased $p$ -values), accounted for the noise but had high variance, or accounted for the noise while having small variance but were restricted to very specific types of data sets. In this paper, we propose statistical tests of independence that address all three problems simultaneously -- they add small amounts of noise, account for this noise to produce accurate $p$ -values, and have no restrictions on the types of tables to which they are applicable. Along with these tests, we propose an alternative methodology for computing the asymptotic distributions of test statistics that results in better finite-sample approximations when using differential privacy to protect data.

View on arXiv

Comments on this paper