Scalable First-Order Methods for Robust MDPs

Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of for the best choice of parameters on ellipsoidal and Kullback-Leibler -rectangular uncertainty sets, where and is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of ) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with -rectangular KL uncertainty sets.
View on arXiv