396
v1v2v3 (latest)

Q-Learning under Finite Model Uncertainty

Main:31 Pages
3 Figures
Bibliography:3 Pages
3 Tables
Abstract

We propose a robust Q-learning algorithm for Markov decision processes under model uncertainty when each state-action pair is associated with a finite ambiguity set of candidate transition kernels. This finite-measure framework enables highly flexible, user-designed uncertainty models and goes beyond the common KL and Wasserstein ball formulations. We establish almost sure convergence of the learned Q-function to the robust optimum, and derive non-asymptotic high-probability error bounds that separate stochastic approximation error from transition-kernel estimation error. Finally, we show that Wasserstein ball and parametric ambiguity sets can be approximated by finite ambiguity sets, allowing our algorithm to be used as a generic solver beyond the finite setting.

View on arXiv
Comments on this paper