Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training

2 June 2023

Binghui Li

Yuanzhi Li

AAML

ArXiv (abs)PDF HTML

Main:8 Pages

4 Figures

Bibliography:3 Pages

2 Tables

Appendix:17 Pages

Abstract

Adversarial training is a standard method to train deep neural networks to be robust to adversarial perturbation. Similar to surprising $\textit{clean generalization}$ ability in the standard deep learning setting, neural networks trained by adversarial training also generalize well for $\textit{unseen clean data}$ . However, in constrast with clean generalization, while adversarial training method is able to achieve low $\textit{robust training error}$ , there still exists a significant $\textit{robust generalization gap}$ , which promotes us exploring what mechanism leads to both $\textit{clean generalization and robust overfitting (CGRO)}$ during learning process. In this paper, we provide a theoretical understanding of this CGRO phenomenon in adversarial training. First, we propose a theoretical framework of adversarial training, where we analyze $\textit{feature learning process}$ to explain how adversarial training leads network learner to CGRO regime. Specifically, we prove that, under our patch-structured dataset, the CNN model provably partially learns the true feature but exactly memorizes the spurious features from training-adversarial examples, which thus results in clean generalization and robust overfitting. For more general data assumption, we then show the efficiency of CGRO classifier from the perspective of $\textit{representation complexity}$ . On the empirical side, to verify our theoretical analysis in real-world vision dataset, we investigate the $\textit{dynamics of loss landscape}$ during training. Moreover, inspired by our experiments, we prove a robust generalization bound based on $\textit{global flatness}$ of loss landscape, which may be an independent interest.

View on arXiv

Comments on this paper