285

Scalable First-Order Methods for Robust MDPs

Abstract

Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve a convergence rate of O(A2S3log(S)log(ϵ1)ϵ1)O(A^{2}S^{3} log(S) \log(\epsilon^{-1})\epsilon^{-1}) for the best choice of parameters on ellipsoidal and Kullback-Leibler s-rectangular uncertainty sets, where SS and AA is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of O(A1.5S1.5)O(A^{1.5}S^{1.5}) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches.

View on arXiv
Comments on this paper