ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1604.05257
57
90
v1v2v3 (latest)

Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

18 April 2016
Sattar Vakili
Qing Zhao
ArXiv (abs)PDFHTML
Abstract

The multi-armed bandit problems have been studied mainly under the measure of expected total reward accrued over a horizon of length TTT. In this paper, we address the issue of risk in multi-armed bandit problems and develop parallel results under the measure of mean-variance, a commonly adopted risk measure in economics and mathematical finance. We show that the model-specific regret and the model-independent regret in terms of the mean-variance of the reward process are lower bounded by Ω(log⁡T)\Omega(\log T)Ω(logT) and Ω(T2/3)\Omega(T^{2/3})Ω(T2/3), respectively. We then show that variations of the UCB policy and the DSEE policy developed for the classic risk-neutral MAB achieve these lower bounds.

View on arXiv
Comments on this paper