Differentially Private Sampling via Reveal-or-Obscure

20 April 2025

Naima Tasnim

Atefeh Gilani

Lalitha Sankar

O. Kosut

ArXiv (abs)PDF HTML Github

Main:7 Pages

4 Figures

Bibliography:1 Pages

Abstract

We introduce a differentially private (DP) algorithm called Reveal-or-Obscure (ROO) to generate a single representative sample from a dataset of n i.i.d. observations from an unknown distribution. Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves $\epsilon$ -differential privacy by choosing whether to "reveal" or "obscure" the empirical distribution with a fixed probability $q$ . While our proposed mechanism is structurally identical to an algorithm proposed by Cheu and Nayak, we prove a strictly better bound on the sampling complexity than that established in their theorem. Building on this framework, we propose a novel generalized sampler called Data-Specific ROO (DS-ROO), where the obscuring probability $q$ is a function of the empirical distribution. We show that when the dataset contains enough samples from every element of the alphabet, DS-ROO can achieve $\epsilon$ -DP while obscuring much less. In addition, we provide tight upper bounds on the utility of DS-ROO in terms of total variation distance. Our results show that under the same privacy budget, DS-ROO can achieve better utility than state-of-the-art private samplers and vanilla ROO, with total variation distance decaying exponentially in dataset size $n$ .

View on arXiv

Comments on this paper