259

Visible Progress on Adversarial Images and a New Saliency Map

Abstract

Many machine learning classifiers are vulnerable to adversarial perturbations. An adversarial perturbation modifies an input to change the prediction of a classifier without causing the input to appear substantially different to the human perceptual system. We make progress on this AI Safety problem as it relates to image classification by training on images after a simple conversion to the YUV colorspace. We demonstrate that adversarial perturbations which modify YUV images are more conspicuous and less pathological than in RGB space. We then show how that whitening RGB images lets us visually see the difference fooling and benign images. Last we introduce a new saliency map to better understand misclassification.

View on arXiv
Comments on this paper