162

Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training

Main:8 Pages
4 Figures
Bibliography:3 Pages
2 Tables
Appendix:17 Pages
Abstract

Adversarial training is a standard method to train deep neural networks to be robust to adversarial perturbation. Similar to surprising clean generalization\textit{clean generalization} ability in the standard deep learning setting, neural networks trained by adversarial training also generalize well for unseen clean data\textit{unseen clean data}. However, in constrast with clean generalization, while adversarial training method is able to achieve low robust training error\textit{robust training error}, there still exists a significant robust generalization gap\textit{robust generalization gap}, which promotes us exploring what mechanism leads to both clean generalization and robust overfitting (CGRO)\textit{clean generalization and robust overfitting (CGRO)} during learning process. In this paper, we provide a theoretical understanding of this CGRO phenomenon in adversarial training. First, we propose a theoretical framework of adversarial training, where we analyze feature learning process\textit{feature learning process} to explain how adversarial training leads network learner to CGRO regime. Specifically, we prove that, under our patch-structured dataset, the CNN model provably partially learns the true feature but exactly memorizes the spurious features from training-adversarial examples, which thus results in clean generalization and robust overfitting. For more general data assumption, we then show the efficiency of CGRO classifier from the perspective of representation complexity\textit{representation complexity}. On the empirical side, to verify our theoretical analysis in real-world vision dataset, we investigate the dynamics of loss landscape\textit{dynamics of loss landscape} during training. Moreover, inspired by our experiments, we prove a robust generalization bound based on global flatness\textit{global flatness} of loss landscape, which may be an independent interest.

View on arXiv
Comments on this paper