60
2

Investigating Distributional Robustness: Semantic Perturbations Using Generative Models

Abstract

In many situations, the i.i.d. assumption cannot be relied upon; training datasets are not representative of the full range of inputs that will be encountered during deployment. Especially in safety-critical applications such as autonomous driving or medical devices, it is essential that we can trust our models to remain performant. In this paper, we introduce a new method for perturbing the semantic features of images (e.g. shape, location, texture, and colour) for the purpose of evaluating classifiers' robustness to these changes. We produce these perturbations by performing small adjustments to the latent activation values of a trained generative neural network (GAN), leveraging its ability to represent diverse semantic properties of an image. We find that state-of-the-art classifiers are not robust to subtle shifts in the semantic features of input data, and that adversarial training against pixel-space perturbations is not just unhelpful: it is counterproductive.

View on arXiv
Comments on this paper