Approximate Dynamic Programming for Dynamic Quantile-Based Risk Measures
In this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs (we call the overall objective a dynamic quantile-based risk measure (DQBRM)). In particular, we consider optimizing dynamic risk measures constructed using one-step risk measures that are a convex combination of the expectation and a QBRM, a class of risk measures that includes the popular value at risk (VaR) and the conditional value at risk (CVaR). Although there is considerable theoretical development of risk-averse MDPs in the literature, the computational challenges have not been explored as thoroughly. We propose a simulation-based approximate dynamic programming (ADP) algorithm to solve the risk-averse sequential decision problem. In addition, we address the issue of inefficient sampling in risk applications and present a procedure, based on importance sampling, to direct samples toward the "risky region" as the ADP algorithm progresses. Finally, we show numerical results of applying our algorithms in the context of an energy storage and bidding application.
View on arXiv