Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

21 March 2025

Abstract

Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap by developing a PAC-Bayesian bound for the worst-class error of smoothed classifiers. Through theoretical analysis, we demonstrate that the largest eigenvalue of the smoothed confusion matrix fundamentally influences the worst-class error of smoothed classifiers. Based on this insight, we introduce a regularization method that optimizes the largest eigenvalue of smoothed confusion matrix to enhance worst-class accuracy of the smoothed classifier and further improve its worst-class certified robustness. We provide extensive experimental validation across multiple datasets and model architectures to demonstrate the effectiveness of our approach.

View on arXiv

@article{jin2025_2503.17172,
  title={ Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers },
  author={ Gaojie Jin and Tianjin Huang and Ronghui Mu and Xiaowei Huang },
  journal={arXiv preprint arXiv:2503.17172},
  year={ 2025 }
}

Comments on this paper