ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.21035
33
56

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

28 April 2025
Rui Xin
Niloofar Mireshghallah
Shuyue Stella Li
Michael Duan
Hyunwoo Kim
Yejin Choi
Yulia Tsvetkov
Sewoong Oh
Pang Wei Koh
ArXivPDFHTML
Abstract

Sanitizing sensitive text data typically involves removing personally identifiable information (PII) or generating synthetic data under the assumption that these methods adequately protect privacy; however, their effectiveness is often only assessed by measuring the leakage of explicit identifiers but ignoring nuanced textual markers that can lead to re-identification. We challenge the above illusion of privacy by proposing a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information -- such as routine social activities -- can be used to infer sensitive attributes like age or substance use history from sanitized data. For instance, we demonstrate that Azure's commercial PII removal tool fails to protect 74\% of information in the MedQA dataset. Although differential privacy mitigates these risks to some extent, it significantly reduces the utility of the sanitized text for downstream tasks. Our findings indicate that current sanitization techniques offer a \textit{false sense of privacy}, highlighting the need for more robust methods that protect against semantic-level information leakage.

View on arXiv
@article{xin2025_2504.21035,
  title={ A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage },
  author={ Rui Xin and Niloofar Mireshghallah and Shuyue Stella Li and Michael Duan and Hyunwoo Kim and Yejin Choi and Yulia Tsvetkov and Sewoong Oh and Pang Wei Koh },
  journal={arXiv preprint arXiv:2504.21035},
  year={ 2025 }
}
Comments on this paper