43
0

The Data Sharing Paradox of Synthetic Data in Healthcare

Abstract

Synthetic data offers a promising solution to privacy concerns in healthcare by generating useful datasets in a privacy-aware manner. However, although synthetic data is typically developed with the intention of sharing said data, ambiguous reidentification risk assessments often prevent synthetic data from seeing the light of day. One of the main causes is that privacy metrics for synthetic data, which inform on reidentification risks, are not well-aligned with practical requirements and regulations regarding data sharing in healthcare. This article discusses the paradoxical situation where synthetic data is designed for data sharing but is often still restricted. We also discuss how the field should move forward to mitigate this issue.

View on arXiv
@article{achterberg2025_2503.20847,
  title={ The Data Sharing Paradox of Synthetic Data in Healthcare },
  author={ Jim Achterberg and Bram van Dijk and Saif ul Islam and Hafiz Muhammad Waseem and Parisis Gallos and Gregory Epiphaniou and Carsten Maple and Marcel Haas and Marco Spruit },
  journal={arXiv preprint arXiv:2503.20847},
  year={ 2025 }
}
Comments on this paper