Multi-Objective Optimization-Based Anonymization of Structured Data for Machine Learning Application

Organizations are collecting vast amounts of data, but they often lack the capabilities needed to fully extract insights. As a result, they increasingly share data with external experts, such as analysts or researchers, to gain value from it. However, this practice introduces significant privacy risks. Various techniques have been proposed to address privacy concerns in data sharing. However, these methods often degrade data utility, impacting the performance of machine learning (ML) models. Our research identifies key limitations in existing optimization models for privacy preservation, particularly in handling categorical variables, and evaluating effectiveness across diverse datasets. We propose a novel multi-objective optimization model that simultaneously minimizes information loss and maximizes protection against attacks. This model is empirically validated using diverse datasets and compared with two existing algorithms. We assess information loss, the number of individuals subject to linkage or homogeneity attacks, and ML performance after anonymization. The results indicate that our model achieves lower information loss and more effectively mitigates the risk of attacks, reducing the number of individuals susceptible to these attacks compared to alternative algorithms in some cases. Additionally, our model maintains comparable ML performance relative to the original data or data anonymized by other methods. Our findings highlight significant improvements in privacy protection and ML model performance, offering a comprehensive and extensible framework for balancing privacy and utility in data sharing.
View on arXiv@article{wei2025_2501.01002, title={ Multi-Objective Optimization-Based Anonymization of Structured Data for Machine Learning Application }, author={ Yusi Wei and Hande Y. Benson and Joseph K. Agor and Muge Capan }, journal={arXiv preprint arXiv:2501.01002}, year={ 2025 } }