466
v1v2v3 (latest)

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Main:9 Pages
Bibliography:2 Pages
1 Tables
Appendix:22 Pages
Abstract

We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the α\alpha-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most O(log(T))O(\log(T)) in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most O(T)O(\sqrt{T}) in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter β(1,]\beta \in (1, \infty], which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves O(log(T)1+β2+βT12+β)O\left(\log(T)^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right) regret under the margin condition.

View on arXiv
Comments on this paper