135

Trading Off Resource Budgets for Improved Regret Bounds

Neural Information Processing Systems (NeurIPS), 2022
Abstract

In this work we consider a variant of adversarial online learning where in each round one picks BB out of NN arms and incurs cost equal to the minimum\textit{minimum} of the costs of each arm chosen. We propose an algorithm called Follow the Perturbed Multiple Leaders (FPML) for this problem, which we show (by adapting the techniques of Kalai and Vempala [2005]) achieves expected regret O(T1B+1ln(N)BB+1)\mathcal{O}(T^{\frac{1}{B+1}}\ln(N)^{\frac{B}{B+1}}) over time horizon TT relative to the single\textit{single} best arm in hindsight. This introduces a trade-off between the budget BB and the single-best-arm regret, and we proceed to investigate several applications of this trade-off. First, we observe that algorithms which use standard regret minimizers as subroutines can sometimes be adapted by replacing these subroutines with FPML, and we use this to generalize existing algorithms for Online Submodular Function Maximization [Streeter and Golovin, 2008] in both the full feedback and semi-bandit feedback settings. Next, we empirically evaluate our new algorithms on an online black-box hyperparameter optimization problem. Finally, we show how FPML can lead to new algorithms for Linear Programming which require stronger oracles at the benefit of fewer oracle calls.

View on arXiv
Comments on this paper