Robust Zero-Sum Deep Reinforcement Learning
This paper presents a methodology for evaluating the sensitivity of deep reinforcement learning policies. This is important when agents are trained in a simulated environment and there is a need to quantify the sensitivity of such policies before exposing agents to the real world where it is hazardous to employ RL policies. In addition, we provide a framework, inspired by H-infinity control theory, for building maximum robustness into trained deep reinforcement centric policies. This robust framework for training deep policies involve a two player zero-sum iterative dynamic game in a concave-convex environment, where the agents' goal is to drive the dynamics to a saddle region. By formalizing an MPC trajectory optimization framework for this two-player system, we evaluate hypothesis on the guided policy search algorithm, without loss of generality, we posit that deep RL policies trained in this fashion will be maximally robust to a "worst" possible disturbance.
View on arXiv