Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap by developing a PAC-Bayesian bound for the worst-class error of smoothed classifiers. Through theoretical analysis, we demonstrate that the largest eigenvalue of the smoothed confusion matrix fundamentally influences the worst-class error of smoothed classifiers. Based on this insight, we introduce a regularization method that optimizes the largest eigenvalue of smoothed confusion matrix to enhance worst-class accuracy of the smoothed classifier and further improve its worst-class certified robustness. We provide extensive experimental validation across multiple datasets and model architectures to demonstrate the effectiveness of our approach.
View on arXiv@article{jin2025_2503.17172, title={ Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers }, author={ Gaojie Jin and Tianjin Huang and Ronghui Mu and Xiaowei Huang }, journal={arXiv preprint arXiv:2503.17172}, year={ 2025 } }