ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.11921
12
14

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

28 January 2022
Jiatai Huang
Yan Dai
Longbo Huang
ArXivPDFHTML
Abstract

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α\alphaα-th (1<α≤21<\alpha\le 21<α≤2) moments bounded by σα\sigma^\alphaσα, while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters α\alphaα and σ\sigmaσ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α,σ\alpha,\sigmaα,σ are unknown, \texttt{HTINF} achieves a log⁡T\log TlogT-style instance-dependent regret in stochastic cases and o(T)o(T)o(T) no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving O(σK1−\nicefrac1αT\nicefrac1α)\mathcal O(\sigma K^{1-\nicefrac 1\alpha}T^{\nicefrac{1}{\alpha}})O(σK1−\nicefrac1αT\nicefrac1α) minimax optimal regret even in adversarial settings, without prior knowledge on α\alphaα and σ\sigmaσ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α\alphaα and σ\sigmaσ are both known. To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both α\alphaα and σ\sigmaσ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

View on arXiv
Comments on this paper