Skip-free Markov decision processes
We introduce a class of models for multidimensional control problems which we call skip-free Markov decision processes, and describe and analyse an algorithm applicable to Markov decision processes that are skip-free in the negative direction. Starting with the finite average cost case, we show that the algorithm combines the advantages of both value iteration and policy iteration -- it is guaranteed to converge to an optimal policy and optimal value function after a finite number of iterations but the computational effort required for each iteration step is comparable with that for value iteration. We show that the algorithm can be easily extended to solve continuous time models, discounted cost models and communicating models, and provides new insights into the formulation of the constraints in the linear programming treatment of skip-free models.
View on arXiv