RAB: Provable Robustness Against Backdoor Attacks

IEEE Symposium on Security and Privacy (IEEE S&P), 2020

19 March 2020

Abstract

Recent studies have shown that deep neural networks are highly vulnerable to adversarial attacks, including evasion and backdoor attacks. On the defense side, there have been intensive interests in provable robustness against evasion attacks, while lack of robustness guarantees against backdoor attacks. In this paper, we focus on certifying the model robustness against general threat models. We first provide a unified framework via randomized smoothing and show it can be instantiated to certify robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to certify model robustness against backdoor attacks. We theoretically prove the robustness bound for machine learning models based on this training process, prove that the bound is tight, and derive robustness conditions for Gaussian and Uniform smoothing distributions. Moreover, we evaluate the certified robustness of a family of smoothed models which are trained in a differentially private fashion, and show that they achieve better certified robustness bounds. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm which eliminates the need to sample from a noise distribution. Empirically, we conduct comprehensive experiments for different machine learning models such as DNNs, differentially private DNNs, and K-NN models on MNIST, CIFAR-10 and ImageNet datasets, and provide the first benchmark for certified robustness against backdoor attacks. We evaluate K-NN models on a spambase tabular dataset to demonstrate advantages of the proposed exact algorithm. The theoretic analysis and the comprehensive benchmark on diverse ML models and datasets shed lights on further robust learning strategies against general adversarial attacks.

View on arXiv

Comments on this paper