MDP Geometry, Normalization and Value Free Solvers

9 July 2024

Arsenii Mustafin

ArXiv (abs)PDF HTML Github

Main:7 Pages

13 Figures

Bibliography:2 Pages

Appendix:22 Pages

Abstract

The Markov Decision Process (MDP) is a widely used mathematical model for sequential decision-making problems. In this paper, we present a new geometric interpretation of MDPs. Based on this interpretation, we show that MDPs can be divided into equivalence classes with indistinguishable key solving algorithms dynamics. This related normalization procedure enables the development of a novel class of MDP-solving algorithms that find optimal policies without computing policy values. The new algorithms we propose for different settings achieve and, in some cases, improve upon state-of-the-art results.

View on arXiv

Comments on this paper