68
47

Private Query Release Assisted by Public Data

Abstract

We study the problem of differentially private query release assisted by access to public data. In this problem, the goal is to answer a large class H\mathcal{H} of statistical queries with error no more than α\alpha using a combination of public and private samples. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of the private and public sample complexities. First, we show that we can solve the problem for any query class H\mathcal{H} of finite VC-dimension using only d/αd/\alpha public samples and pd3/2/α2\sqrt{p}d^{3/2}/\alpha^2 private samples, where dd and pp are the VC-dimension and dual VC-dimension of H\mathcal{H}, respectively. In comparison, with only private samples, this problem cannot be solved even for simple query classes with VC-dimension one, and without any private samples, a larger public sample of size d/α2d/\alpha^2 is needed. Next, we give sample complexity lower bounds that exhibit tight dependence on pp and α\alpha. For the class of decision stumps, we give a lower bound of p/α\sqrt{p}/\alpha on the private sample complexity whenever the public sample size is less than 1/α21/\alpha^2. Given our upper bounds, this shows that the dependence on p\sqrt{p} is necessary in the private sample complexity. We also give a lower bound of 1/α1/\alpha on the public sample complexity for a broad family of query classes, which by our upper bound, is tight in α\alpha.

View on arXiv
Comments on this paper