242

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

International Conference on Machine Learning (ICML), 2023
Abstract

We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class up to a bounded misspecification level ζ>0\zeta>0. We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. We show that, when the misspecification level ζ\zeta is dominated by O~(Δ/d)\tilde O (\Delta / \sqrt{d}) with Δ\Delta being the minimal sub-optimality gap and dd being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound O~(d2/Δ)\tilde O (d^2/\Delta) as in the well-specified setting up to logarithmic factors. In addition, we show that an existing algorithm SupLinUCB (Chu et al., 2011) can also achieve a gap-dependent constant regret bound without the knowledge of sub-optimality gap Δ\Delta. Together with a lower bound adapted from Lattimore et al. (2020), our result suggests an interplay between misspecification level and the sub-optimality gap: (1) the linear contextual bandit model is efficiently learnable when ζO~(Δ/d)\zeta \leq \tilde O(\Delta / \sqrt{d}); and (2) it is not efficiently learnable when ζΩ~(Δ/d)\zeta \geq \tilde \Omega({\Delta} / {\sqrt{d}}). Experiments on both synthetic and real-world datasets corroborate our theoretical results.

View on arXiv
Comments on this paper