ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.09864
21
24

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

19 February 2021
Chloé Rouyer
Yevgeny Seldin
Nicolò Cesa-Bianchi
    AAML
ArXivPDFHTML
Abstract

We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ\lambdaλ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of O((λK)1/3T2/3+KT)O\big((\lambda K)^{1/3}T^{2/3} + \sqrt{KT}\big)O((λK)1/3T2/3+KT​), where TTT is the time horizon and KKK is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of O(((λK)2/3T1/3+ln⁡T)∑i≠i∗Δi−1)O\left(\big((\lambda K)^{2/3} T^{1/3} + \ln T\big)\sum_{i \neq i^*} \Delta_i^{-1}\right)O(((λK)2/3T1/3+lnT)∑i=i∗​Δi−1​), where Δi\Delta_iΔi​ are the suboptimality gaps and i∗i^*i∗ is a unique optimal arm. In the special case of λ=0\lambda = 0λ=0 (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.

View on arXiv
Comments on this paper