387

Learnable Boundary Guided Adversarial Training

IEEE International Conference on Computer Vision (ICCV), 2020
Abstract

Previous adversarial training raises model robustness under the compromise of accuracy on natural data. In this paper, our target is to reduce natural accuracy degradation. We use the model logits from one clean model Mnatural\mathcal{M}^{natural} to guide learning of the robust model Mrobust\mathcal{M}^{robust}, taking into consideration that logits from the well trained clean model Mnatural\mathcal{M}^{natural} embed the most discriminative features of natural data, {\it e.g.}, generalizable classifier boundary. Our solution is to constrain logits from the robust model Mrobust\mathcal{M}^{robust} that takes adversarial examples as input and make it similar to those from a clean model Mnatural\mathcal{M}^{natural} fed with corresponding natural data. It lets Mrobust\mathcal{M}^{robust} inherit the classifier boundary of Mnatural\mathcal{M}^{natural}. Thus, we name our method Boundary Guided Adversarial Training (BGAT). Moreover, we generalize BGAT to Learnable Boundary Guided Adversarial Training (LBGAT) by training Mnatural\mathcal{M}^{natural} and Mrobust\mathcal{M}^{robust} simultaneously and collaboratively to learn one most robustness-friendly classifier boundary for the strongest robustness. Extensive experiments are conducted on CIFAR-10, CIFAR-100, and challenging Tiny ImageNet datasets. Along with other state-of-the-art adversarial training approaches, {\it e.g.}, Adversarial Logit Pairing (ALP) and TRADES, the performance is further enhanced.

View on arXiv
Comments on this paper