488
v1v2v3 (latest)

Differentially Private Kernel Density Estimation

Jerry Yao-Chieh Hu
Zhao Song
Han Liu
Main:4 Pages
3 Figures
3 Tables
Appendix:32 Pages
Abstract

We introduce a refined differentially private (DP) data structure for kernel density estimation (KDE), offering not only improved privacy-utility tradeoff but also better efficiency over prior results. Specifically, we study the mathematical problem: given a similarity function ff (or DP KDE) and a private dataset XRdX \subset \mathbb{R}^d, our goal is to preprocess XX so that for any query yRdy\in\mathbb{R}^d, we approximate xXf(x,y)\sum_{x \in X} f(x, y) in a differentially private fashion. The best previous algorithm for f(x,y)=xy1f(x,y) =\| x - y \|_1 is the node-contaminated balanced binary tree by [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024]. Their algorithm requires O(nd)O(nd) space and time for preprocessing with n=Xn=|X|. For any query point, the query time is dlognd \log n, with an error guarantee of (1+α)(1+\alpha)-approximation and ϵ1α0.5d1.5Rlog1.5n\epsilon^{-1} \alpha^{-0.5} d^{1.5} R \log^{1.5} n.In this paper, we improve the best previous result [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024] in three aspects:- We reduce query time by a factor of α1logn\alpha^{-1} \log n.- We improve the approximation ratio from α\alpha to 1.- We reduce the error dependence by a factor of α0.5\alpha^{-0.5}.From a technical perspective, our method of constructing the search tree differs from previous work [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024]. In prior work, for each query, the answer is split into α1logn\alpha^{-1} \log n numbers, each derived from the summation of logn\log n values in interval tree countings. In contrast, we construct the tree differently, splitting the answer into logn\log n numbers, where each is a smart combination of two distance values, two counting values, and yy itself. We believe our tree structure may be of independent interest.

View on arXiv
Comments on this paper