64
0

Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms

Abstract

This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-sensitive models. An important and hitherto unknown observation is that for a wide range of riskmetrics, the optimal bandit policy involves selecting a mixture of arms. This is in sharp contrast to the convention in the multi-arm bandit algorithms that there is generally a solitary arm that maximizes the utility, whether purely reward-centric or risk-sensitive. This creates a major departure from the principles for designing bandit algorithms since there are uncountable mixture possibilities. The contributions of the paper are as follows: (i) it formalizes a general framework for risk-sensitive bandits, (ii) identifies standard risk-sensitive bandit models for which solitary arm selections is not optimal, (iii) and designs regret-efficient algorithms whose sampling strategies can accurately track optimal arm mixtures (when mixture is optimal) or the solitary arms (when solitary is optimal). The algorithms are shown to achieve a regret that scales according to O((logT/T)ν)O((\log T/T )^{\nu}), where TT is the horizon, and ν>0\nu>0 is a riskmetric-specific constant.

View on arXiv
@article{tatlı2025_2503.08896,
  title={ Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms },
  author={ Meltem Tatlı and Arpan Mukherjee and Prashanth L.A. and Karthikeyan Shanmugam and Ali Tajer },
  journal={arXiv preprint arXiv:2503.08896},
  year={ 2025 }
}
Comments on this paper