A Unified Framework for Risk-sensitive Markov Decision Processes with
Finite State and Action Spaces
We introduce a unified framework to incorporate risk in Markov decision processes (MDPs), via prospect maps, which generalize the idea of coherent/convex risk measures in mathematical finance. Most of the existing risk-sensitive approaches in various literature concerning with decision-making problems are contained in the framework as special instances. Within the framework, we solve the optimal control problems according to two criteria, the newly invented temporal discounted criterion, which generalizes the conventional discount scheme, and the average criterion, by value iteration algorithms under different assumptions. Two online algorithms are proposed to solve the optimal controls problem when the exact MDP is unknown and has to be estimated during optimization.
View on arXiv