Probabilistic Reasoning with LLMs for k-anonymity Estimation

12 March 2025

Abstract

Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a novel numerical reasoning task under uncertainty, focusing on estimating the k-anonymity of user-generated documents containing privacy-sensitive information. We propose BRANCH, which uses LLMs to factorize a joint probability distribution to estimate the k-value-the size of the population matching the given information-by modeling individual pieces of textual information as random variables. The probability of each factor occurring within a population is estimated using standalone LLMs or retrieval-augmented generation systems, and these probabilities are combined into a final k-value. Our experiments show that this method successfully estimates the correct k-value 67% of the time, an 11% increase compared to GPT-4o chain-of-thought reasoning. Additionally, we leverage LLM uncertainty to develop prediction intervals for k-anonymity, which include the correct value in nearly 92% of cases.

View on arXiv

@article{zheng2025_2503.09674,
  title={ Probabilistic Reasoning with LLMs for k-anonymity Estimation },
  author={ Jonathan Zheng and Sauvik Das and Alan Ritter and Wei Xu },
  journal={arXiv preprint arXiv:2503.09674},
  year={ 2025 }
}

Comments on this paper