MDP Geometry, Normalization and Reward Balancing Solvers

9 July 2024

Arsenii Mustafin

ArXiv (abs)PDF HTML Github

Main:7 Pages

13 Figures

Bibliography:2 Pages

Appendix:22 Pages

Abstract

The Markov Decision Process (MDP) is a widely used mathematical model for sequential decision-making problems. In this paper, we present a new geometric interpretation of MDPs with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This procedure enables the development of a novel class of algorithms for solving MDPs that find optimal policies without explicitly computing policy values. The new algorithms we propose for different settings achieve and, in some cases, improve upon state-of-the-art sample complexity results.

View on arXiv

Comments on this paper