23
0

Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity

Abstract

We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class F\mathcal{F} of kk distributions and a set of i.i.d. samples from an unknown distribution hh, the goal of hypothesis selection is to pick a distribution f^\hat{f} whose total variation distance to hh is comparable with the best distribution in F\mathcal{F} (with high probability). We devise an ε\varepsilon-locally-differentially-private (ε\varepsilon-LDP) algorithm that uses Θ(kα2min{ε2,1})\Theta\left(\frac{k}{\alpha^2\min \{\varepsilon^2,1\}}\right) samples to guarantee that dTV(h,f^)α+9minfFdTV(h,f)d_{TV}(h,\hat{f})\leq \alpha + 9 \min_{f\in \mathcal{F}}d_{TV}(h,f) with high probability. This sample complexity is optimal for ε<1\varepsilon<1, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required Ω(klogkα2min{ε2,1})\Omega\left(\frac{k\log k}{\alpha^2\min \{ \varepsilon^2 ,1\}} \right) samples to work. Moreover, our result demonstrates the power of interaction for ε\varepsilon-LDP hypothesis selection. Namely, it breaks the known lower bound of Ω(klogkα2min{ε2,1})\Omega\left(\frac{k\log k}{\alpha^2\min \{ \varepsilon^2 ,1\}} \right) for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only Θ(loglogk)\Theta(\log \log k) rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.

View on arXiv
Comments on this paper