All Papers
Title |
|---|
Title |
|---|

Advances in machine learning are paving the way for the artificial generation of high-quality images and videos. In this paper, however, we show that generating synthetic samples with generative models can lead to information leakage, i.e., an adversary might infer information about individuals whose data is used to train the models. To this end, we train a Generative Adversarial Network (GAN), which combines a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets by relying on the discriminator's capacity to learn statistical differences in distributions. We present attacks based on both white-box and black-box access to the target model, and show how to improve the latter using limited auxiliary knowledge of samples in the dataset. We test our attacks on several state-of-the-art models, such as Deep Convolutional GAN (DCGAN), Boundary Equilibrium GAN (BEGAN), and the combination of DCGAN with a Variational Autoencoder (DCGAN+VAE), using datasets consisting of complex representations of faces (LFW), objects (CIFAR-10), as well as medical images (Diabetic Retinopathy). The white-box attacks are 100% successful at inferring which samples were used to train the target model, and the black-box ones can infer training set membership with up to over 80% accuracy.
View on arXiv