111

Achieving differential privacy for kk-nearest neighbors based outlier detection by data partitioning

Abstract

When applying outlier detection in settings where data is sensitive, mechanisms which guarantee the privacy of the underlying data are needed. The kk-nearest neighbors (kk-NN) algorithm is a simple and one of the most effective methods for outlier detection. So far, there have been no attempts made to develop a differentially private (ϵ\epsilon-DP) approach for kk-NN based outlier detection. Existing approaches often relax the notion of ϵ\epsilon-DP and employ other methods than kk-NN. We propose a method for kk-NN based outlier detection by separating the procedure into a fitting step on reference inlier data and then apply the outlier classifier to new data. We achieve ϵ\epsilon-DP for both the fitting algorithm and the outlier classifier with respect to the reference data by partitioning the dataset into a uniform grid, which yields low global sensitivity. Our approach yields nearly optimal performance on real-world data with varying dimensions when compared to the non-private versions of kk-NN.

View on arXiv
Comments on this paper