Differentially Private Sampling via Reveal-or-Obscure
We introduce a differentially private (DP) algorithm called Reveal-or-Obscure (ROO) to generate a single representative sample from a dataset of n i.i.d. observations from an unknown distribution. Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves -differential privacy by choosing whether to "reveal" or "obscure" the empirical distribution with a fixed probability . While our proposed mechanism is structurally identical to an algorithm proposed by Cheu and Nayak, we prove a strictly better bound on the sampling complexity than that established in their theorem. Building on this framework, we propose a novel generalized sampler called Data-Specific ROO (DS-ROO), where the obscuring probability is a function of the empirical distribution. We show that when the dataset contains enough samples from every element of the alphabet, DS-ROO can achieve -DP while obscuring much less. In addition, we provide tight upper bounds on the utility of DS-ROO in terms of total variation distance. Our results show that under the same privacy budget, DS-ROO can achieve better utility than state-of-the-art private samplers and vanilla ROO, with total variation distance decaying exponentially in dataset size .
View on arXiv