97

Reward-Preserving Attacks For Robust Reinforcement Learning

Lucas Schott
Elies Gherbi
Hatem Hajri
Sylvain Lamprier
Main:9 Pages
6 Figures
Bibliography:1 Pages
Appendix:9 Pages
Abstract

Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate strength varies by state. We propose α\alpha-reward-preserving attacks, which adapt the strength of the adversary so that an α\alpha fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, we use a gradient-based attack direction and learn a state-dependent magnitude ηηB\eta\le \eta_{\mathcal B} selected via a critic Qαπ((s,a),η)Q^{\pi}_\alpha((s,a),\eta) trained off-policy over diverse radii. This adaptive tuning calibrates attack strength and, with intermediate α\alpha, improves robustness across radii while preserving nominal performance, outperforming fixed- and random-radius baselines.

View on arXiv
Comments on this paper