Robust Zero-Sum Deep Reinforcement Learning

2 October 2017

Abstract

This paper presents a methodology for evaluating the sensitivity of deep reinforcement learning policies. This is important when agents are trained in a simulated environment and there is a need to quantify the sensitivity of such policies before exposing agents to the real world where it is hazardous to employ RL policies. In addition, we provide a framework, inspired by H-infinity control theory, for building maximum robustness into a trained deep reinforcement centric policies. The robust framework for training deep policies involve a two player zero-sum iterative dynamic game that pits an agent and its environment against an adversary. By formalizing an MPC trajectory optimization framework for this two-player system, we derive a saddle-point equilibrium and posit that the trained policy will be maximally robust to the "worst" possible disturbance.

View on arXiv

Comments on this paper