ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.07623
201
182
v1v2v3v4v5v6 (latest)

Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits

19 July 2018
Julian Zimmert
Yevgeny Seldin
    AAML
ArXiv (abs)PDFHTML
Abstract

We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. 1 The algorithm is based on online mirror descent with Tsallis entropy regularizer. We provide a complete characterization of such algorithms and show that Tsallis entropy with power α=1/2\alpha=1/2α=1/2 achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: stochastic bandits with adversarial corruptions introduced by Lykouris et al., and the stochastically constrained adversary studied by Wei and Luo. The algorithm also achieves adversarial and stochastic optimality in the utility-based dueling bandit setting. We provide empirical evaluation of the algorithm demonstrating that it outperforms UCB1 and EXP3 in stochastic environments. We also provide examples of adversarial environments, where UCB1 and Thompson Sampling exhibit almost linear regret, whereas our algorithm suffers only "logarithmic" regret.

View on arXiv
Comments on this paper