Catastrophic Overfitting, Entropy Gap and Participation Ratio: A Noiseless Norm Solution for Fast Adversarial Training
- AAML
Adversarial training is a cornerstone of robust deep learning, but fast methods like the Fast Gradient Sign Method (FGSM) often suffer from Catastrophic Overfitting (CO), where models become robust to single-step attacks but fail against multi-step variants. While existing solutions rely on noise injection, regularization, or gradient clipping, we propose a novel solution that purely controls the training norm to mitigate CO.Our study is motivated by the empirical observation that CO is more prevalent under the norm than the norm. Leveraging this insight, we develop a framework for generalized attack as a fixed point problem and craft -FGSM attacks to understand the transition mechanics from to . This leads to our core insight: CO emerges when highly concentrated gradients where information localizes in few dimensions interact with aggressive norm constraints. By quantifying gradient concentration through Participation Ratio and entropy measures, we develop an adaptive -FGSM that automatically tunes the training norm based on gradient information. Extensive experiments demonstrate that this approach achieves strong robustness without requiring additional regularization or noise injection, providing a novel and theoretically-principled pathway to mitigate the CO problem.
View on arXiv