ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.09390
13
19

Regime Switching Bandits

26 January 2020
Xiang Zhou
Yi Xiong
Ningyuan Chen
Xuefeng Gao
ArXivPDFHTML
Abstract

We study a multi-armed bandit problem where the rewards exhibit regime switching. Specifically, the distributions of the random rewards generated from all arms are modulated by a common underlying state modeled as a finite-state Markov chain. The agent does not observe the underlying state and has to learn the transition matrix and the reward distributions. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, belief error control in partially observable Markov decision processes and upper-confidence-bound methods for online learning. We also establish an upper bound O(T2/3log⁡T)O(T^{2/3}\sqrt{\log T})O(T2/3logT​) for the proposed learning algorithm where TTT is the learning horizon. Finally, we conduct proof-of-concept experiments to illustrate the performance of the learning algorithm.

View on arXiv
Comments on this paper