Restless Hidden Markov Bandits with Linear Rewards

22 October 2019

Abstract

This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but has to infer it based on its actions and the received reward. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, the regret does not depend on the number of extreme points in the action space.

View on arXiv

Comments on this paper