67
8

Developing Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering

Abstract

We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information SS from publicly accessible data XX without using statistics. We consider the problem of generating and releasing a quantization X^\hat{X} of XX to minimize the privacy leakage of SS to X^\hat{X} while maintaining a certain level of utility (or, inversely, the quantization loss). The variables SS and SS are treated as bounded and non-probabilistic, but are otherwise general. We consider two existing non-stochastic privacy measures, namely the maximum uncertainty reduction L0(SX^)L_0(S \rightarrow \hat{X}) and the refined information I(S;X^)I_*(S; \hat{X}) (also called the maximin information) of SS. For each privacy measure, we propose a corresponding agglomerative clustering algorithm that converges to a locally optimal quantization solution X^\hat{X} by iteratively merging elements in the alphabet of XX. To instantiate the solution to this problem, we consider two specific utility measures, the worst-case resolution of XX by observing X^\hat{X} and the maximal distortion of the released data X^\hat{X}. We show that the value of the maximin information I(S;X^)I_*(S; \hat{X}) can be determined by dividing the confusability graph into connected subgraphs. Hence, I(S;X^)I_*(S; \hat{X}) can be reduced by merging nodes connecting subgraphs. The relation to the probabilistic information-theoretic privacy is also studied by noting that the G{\'a}cs-K{\"o}rner common information is the stochastic version of II_* and indicates the attainability of statistical indistinguishability.

View on arXiv
Comments on this paper