16
1

On Corruption-Robustness in Performative Reinforcement Learning

Abstract

In performative Reinforcement Learning (RL), an agent faces a policy-dependent environment: the reward and transition functions depend on the agent's policy. Prior work on performative RL has studied the convergence of repeated retraining approaches to a performatively stable policy. In the finite sample regime, these approaches repeatedly solve for a saddle point of a convex-concave objective, which estimates the Lagrangian of a regularized version of the reinforcement learning problem. In this paper, we aim to extend such repeated retraining approaches, enabling them to operate under corrupted data. More specifically, we consider Huber's ϵ\epsilon-contamination model, where an ϵ\epsilon fraction of data points is corrupted by arbitrary adversarial noise. We propose a repeated retraining approach based on convex-concave optimization under corrupted gradients and a novel problem-specific robust mean estimator for the gradients. We prove that our approach exhibits last-iterate convergence to an approximately stable policy, with the approximation error linear in ϵ\sqrt{\epsilon}. We experimentally demonstrate the importance of accounting for corruption in performative RL.

View on arXiv
@article{pollatos2025_2505.05609,
  title={ On Corruption-Robustness in Performative Reinforcement Learning },
  author={ Vasilis Pollatos and Debmalya Mandal and Goran Radanovic },
  journal={arXiv preprint arXiv:2505.05609},
  year={ 2025 }
}
Comments on this paper