Probabilistic Reasoning with LLMs for k-anonymity Estimation
Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a novel numerical reasoning task under uncertainty, focusing on estimating the k-anonymity of user-generated documents containing privacy-sensitive information. We propose BRANCH, which uses LLMs to factorize a joint probability distribution to estimate the k-value-the size of the population matching the given information-by modeling individual pieces of textual information as random variables. The probability of each factor occurring within a population is estimated using standalone LLMs or retrieval-augmented generation systems, and these probabilities are combined into a final k-value. Our experiments show that this method successfully estimates the correct k-value 67% of the time, an 11% increase compared to GPT-4o chain-of-thought reasoning. Additionally, we leverage LLM uncertainty to develop prediction intervals for k-anonymity, which include the correct value in nearly 92% of cases.
View on arXiv@article{zheng2025_2503.09674, title={ Probabilistic Reasoning with LLMs for k-anonymity Estimation }, author={ Jonathan Zheng and Sauvik Das and Alan Ritter and Wei Xu }, journal={arXiv preprint arXiv:2503.09674}, year={ 2025 } }