LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the -\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter , which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves regret under the margin condition.
View on arXiv