v1v2 (latest)

A Customized Text Sanitization Mechanism with Differential Privacy

Annual Meeting of the Association for Computational Linguistics (ACL), 2022

4 July 2022

Cen Chen

Chengyu Wang

ArXiv (abs)PDF HTML Github (7★)

Abstract

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good trade-offs between privacy and utility. To address the above limitations, we propose a novel Customized Text (CusText) sanitization mechanism based on the original $\epsilon$ -differential privacy (DP) definition, which is compatible with any similarity measure. Furthermore, CusText assigns each input token a customized output set of tokens to provide more advanced privacy protection at the token level. Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between privacy and utility than existing mechanisms. The code is available at https://github.com/sai4july/CusText.

View on arXiv

Comments on this paper