96

MDP Geometry, Normalization and Reward Balancing Solvers

Main:7 Pages
13 Figures
Bibliography:2 Pages
Appendix:22 Pages
Abstract

The Markov Decision Process (MDP) is a widely used mathematical model for sequential decision-making problems. In this paper, we present a new geometric interpretation of MDPs with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This procedure enables the development of a novel class of algorithms for solving MDPs that find optimal policies without explicitly computing policy values. The new algorithms we propose for different settings achieve and, in some cases, improve upon state-of-the-art sample complexity results.

View on arXiv
Comments on this paper