Non-asymptotic Performances of Robust Markov Decision Processes

9 May 2021

Wenhao Yang

Abstract

In this paper, we study the non-asymptotic performance of optimal policy on robust value function with true transition dynamics. The optimal robust policy is solved from a generative model or offline dataset without access to true transition dynamics. In particular, we consider three different uncertainty sets including the $L_1$ , $\chi^2$ and KL balls in both $(s,a)$ -rectangular and $s$ -rectangular assumptions. Our results show that when we assume $(s,a)$ -rectangular on uncertainty sets, the sample complexity is about $\widetilde{O}\left(\frac{|\mathcal{S}|^2|\mathcal{A}|}{\varepsilon^2\rho^2(1-\gamma)^4}\right)$ in the generative model setting and $\widetilde{O}\left(\frac{|\mathcal{S}|}{\nu_{\min}\varepsilon^2\rho^2(1-\gamma)^4}\right)$ in the offline dataset setting. While prior works on non-asymptotic performances are restricted with the KL ball and $(s,a)$ -rectangular assumption, we also extend our results to a more general $s$ -rectangular assumption, which leads to a larger sample complexity than the $(s,a)$ -rectangular assumption.

View on arXiv

Comments on this paper