Why Clean Generalization and Robust Overfitting Both Happen in
Adversarial Training
- AAML
Adversarial training is a standard method to train deep neural networks to be robust to adversarial perturbation. Similar to surprising ability in the standard deep learning setting, neural networks trained by adversarial training also generalize well for . However, in constrast with clean generalization, while adversarial training method is able to achieve low , there still exists a significant , which promotes us exploring what mechanism leads to both during learning process. In this paper, we provide a theoretical understanding of this CGRO phenomenon in adversarial training. First, we propose a theoretical framework of adversarial training, where we analyze to explain how adversarial training leads network learner to CGRO regime. Specifically, we prove that, under our patch-structured dataset, the CNN model provably partially learns the true feature but exactly memorizes the spurious features from training-adversarial examples, which thus results in clean generalization and robust overfitting. For more general data assumption, we then show the efficiency of CGRO classifier from the perspective of . On the empirical side, to verify our theoretical analysis in real-world vision dataset, we investigate the during training. Moreover, inspired by our experiments, we prove a robust generalization bound based on of loss landscape, which may be an independent interest.
View on arXiv