Defending against Adversarial Attack towards Deep Neural Networks via Collaborative Multi-task Training

IEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2018

14 March 2018

Abstract

Deep neural networks (DNNs) are known to be vulnerable to adversarial examples which contain imperceptible perturbations. A series of defending methods, either proactive defence or reactive defence, have been proposed in the recent years. However, most of the methods can only handle specific attacks. For example, proactive defending methods are invalid against grey-box or white-box attack, while reactive defending methods are challenged by low-distortion adversarial examples or transferring adversarial examples. This becomes a critical problem since a defender usually do not have the type of the attack as a priori knowledge. Moreover, the two-pronged defence (e.g. MagNet), which takes the advantages of both proactive and reactive methods, has been reported as broken under transferring attacks. To address this problem, this paper proposed a novel defensive framework based on collaborative multi-task training, aiming at providing defence for different types of attacks. The proposed defence first encodes training labels into label pairs and counters black-box attack leveraging adversarial training supervised by the encoded label pairs. The defence further constructs a detector to identify and reject high-confidence adversarial examples that bypass the black-box defence. In addition, the proposed collaborative architecture can prevent adversaries from finding valid adversarial examples when the defending strategy is exposed. As far as we know, our method is a new two-pronged defence that is resilient to the transferring attack targeting MagNet.

View on arXiv

Comments on this paper