Developing Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering

We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information from publicly accessible data without using statistics. We consider the problem of generating and releasing a quantization of to minimize the privacy leakage of to while maintaining a certain level of utility (or, inversely, the quantization loss). The variables and are treated as bounded and non-probabilistic, but are otherwise general. We consider two existing non-stochastic privacy measures, namely the maximum uncertainty reduction and the refined information (also called the maximin information) of . For each privacy measure, we propose a corresponding agglomerative clustering algorithm that converges to a locally optimal quantization solution by iteratively merging elements in the alphabet of . To instantiate the solution to this problem, we consider two specific utility measures, the worst-case resolution of by observing and the maximal distortion of the released data . We show that the value of the maximin information can be determined by dividing the confusability graph into connected subgraphs. Hence, can be reduced by merging nodes connecting subgraphs. The relation to the probabilistic information-theoretic privacy is also studied by noting that the G{\'a}cs-K{\"o}rner common information is the stochastic version of and indicates the attainability of statistical indistinguishability.
View on arXiv