Variance-Optimal Arm Selection: Misallocation Minimization and Best Arm Identification
This paper focuses on selecting the arm with the highest variance from a set of independent arms. Specifically, we focus on two settings: (i) misallocation minimization setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget best arm identification setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called UCB-VV for the misallocation minimization (MM) and show that its upper bound on misallocation for bounded rewards evolves as where is the horizon. By deriving the lower bound on the misallocation, we show that UCB-VV is order optimal. For the fixed budget best arm identification (BAI) setting we propose the SHVV algorithm. We show that the upper bound of the error probability of SHVV evolves as , where represents the complexity of the problem, and this rate matches the corresponding lower bound. We extend the framework from bounded distributions to sub-Gaussian distributions using a novel concentration inequality on the sample variance and standard deviation. Leveraging the same, we derive a concentration inequality for the empirical Sharpe ratio (SR) for sub-Gaussian distributions, which was previously unknown in the literature. Empirical simulations show that UCB-VV consistently outperforms -greedy across different sub-optimality gaps though it is surpassed by VTS, which exhibits the lowest misallocation, albeit lacking in theoretical guarantees. We also illustrate the superior performance of SHVV, for a fixed budget setting under 6 different setups against uniform sampling. Finally, we conduct a case study to empirically evaluate the performance of the UCB-VV and SHVV in call option trading on stocks generated using GBM.
View on arXiv