262
v1v2 (latest)

Risk-averse learning with delayed feedback

Main:9 Pages
8 Figures
Bibliography:2 Pages
Abstract

In real-world scenarios, risk-averse learning is valuable for mitigating potential adverse outcomes. However, the delayed feedback makes it challenging to assess and manage risk effectively. In this paper, we investigate risk-averse learning using Conditional Value at Risk (CVaR) as risk measure, while incorporating feedback with random but bounded delays. We develop two risk-averse learning algorithms that rely on one-point and two-point zeroth-order optimization approaches, respectively. The dynamic regrets of the algorithms are analyzed in terms of the cumulative delay and the number of total samplings. In the absence of delay, the regret bounds match the established bounds of zeroth-order stochastic gradient methods for risk-averse learning. Furthermore, the two-point risk-averse learning outperforms the one-point algorithm by achieving a smaller regret bound. We provide numerical experiments on a dynamic pricing problem to demonstrate the performance of the algorithms.

View on arXiv
Comments on this paper