224

Differentially Private Random Decision Forests using Smooth Sensitivity

Abstract

We propose and prove a new sensitivity bound for the differentially private query of "what is the most frequent item in set xx?". To do so, we use the idea of "smooth sensitivity", which takes into account the specific data used in the query rather than assuming the worst-case scenario. Differential privacy is a strong mathematical model that offers privacy guarantees to every person in the data. We then apply our proposed sensitivity to a forest of randomly built decision trees, querying each leaf node to output the most frequent class label. We also extend work done on the optimal depth of random decision trees, we extend the theory to handle continuous features, not just discrete features. This, along with several other improvements, allows us to create a differentially private decision forest with substantially higher predictive power than the current state-of-the-art. Our findings in this paper are generalized to machine learning applications beyond decision trees, if privacy is a concern, and the query can be phrased in terms of the most (or least) frequent item in a set, we prove that this query is very insensitive and can output accurate answers under strong privacy requirements.

View on arXiv
Comments on this paper