Non-asymptotic Performances of Robust Markov Decision Processes
In this paper, we study the non-asymptotic performance of optimal policy on robust value function with true transition dynamics. The optimal robust policy is solved from a generative model or offline dataset without access to true transition dynamics. In particular, we consider three different uncertainty sets including the , and KL balls in both -rectangular and -rectangular assumptions. Our results show that when we assume -rectangular on uncertainty sets, the sample complexity is about in the generative model setting and in the offline dataset setting. While prior works on non-asymptotic performances are restricted with the KL ball and -rectangular assumption, we also extend our results to a more general -rectangular assumption, which leads to a larger sample complexity than the -rectangular assumption.
View on arXiv