On the Generalization of Adversarially Trained Quantum Classifiers

Quantum classifiers are vulnerable to adversarial attacks that manipulate their input classical or quantum data. A promising countermeasure is adversarial training, where quantum classifiers are trained by using an attack-aware, adversarial loss function. This work establishes novel bounds on the generalization error of adversarially trained quantum classifiers when tested in the presence of perturbation-constrained adversaries. The bounds quantify the excess generalization error incurred to ensure robustness to adversarial attacks as scaling with the training sample size as , while yielding insights into the impact of the quantum embedding. For quantum binary classifiers employing \textit{rotation embedding}, we find that, in the presence of adversarial attacks on classical inputs , the increase in sample complexity due to adversarial training over conventional training vanishes in the limit of high dimensional inputs . In contrast, when the adversary can directly attack the quantum state encoding the input , the excess generalization error depends on the choice of embedding only through its Hilbert space dimension. The results are also extended to multi-class classifiers. We validate our theoretical findings with numerical experiments.
View on arXiv@article{georgiou2025_2504.17690, title={ On the Generalization of Adversarially Trained Quantum Classifiers }, author={ Petros Georgiou and Aaron Mark Thomas and Sharu Theresa Jose and Osvaldo Simeone }, journal={arXiv preprint arXiv:2504.17690}, year={ 2025 } }