183
v1v2 (latest)

Online learning over a finite action set with limited switching

Abstract

This paper studies the value of switching actions in the Prediction From Experts (PFE) problem and Adversarial Multi-Armed Bandits (MAB) problem. First, we revisit the well-studied and practically motivated setting of PFE with switching costs. Many algorithms are known to achieve the minimax optimal order of O(Tlogn)O(\sqrt{T \log n}) in expectation for both regret and number of switches, where TT is the number of iterations and nn the number of actions. However, no high probability (h.p.) guarantees are known. Our main technical contribution is the first algorithms which with h.p. achieve this optimal order for both regret and switches. This settles an open problem of [Devroye et al., 2015], and directly implies the first h.p. guarantees for several problems of interest. Next, to investigate the value of switching actions at a more granular level, we introduce the setting of switching budgets, in which algorithms are limited to STS \leq T switches between actions. This entails a limited number of free switches, in contrast to the unlimited number of expensive switches in the switching cost setting. Using the above result and several reductions, we unify previous work and completely characterize the complexity of this switching budget setting up to small polylogarithmic factors: for both PFE and MAB, for all switching budgets STS \leq T, and for both expectation and h.p. guarantees. For PFE, we show the optimal rate is Θ~(Tlogn)\tilde{\Theta}(\sqrt{T\log n}) for S=Ω(Tlogn)S = \Omega(\sqrt{T\log n}), and min(Θ~(TlognS),T)\min(\tilde{\Theta}(\tfrac{T\log n}{S}), T) for S=O(Tlogn)S = O(\sqrt{T \log n}). Interestingly, the bandit setting does not exhibit such a phase transition; instead we show the minimax rate decays steadily as min(Θ~(TnS),T)\min(\tilde{\Theta}(\tfrac{T\sqrt{n}}{\sqrt{S}}), T) for all ranges of STS \leq T. These results recover and generalize the known minimax rates for the (arbitrary) switching cost setting.

View on arXiv
Comments on this paper