Optimal Regret with Limited Adaptivity for Generalized Linear Contextual Bandits

Neural Information Processing Systems (NeurIPS), 2024

10 April 2024

Main:9 Pages

1 Figures

Bibliography:2 Pages

Appendix:22 Pages

Abstract

We study the generalized linear contextual bandit problem within the requirements of limited adaptivity. In this paper, we present two algorithms, B-GLinCB and RS-GLinCB, that address, respectively, two prevalent limited adaptivity models: batch learning with stochastic contexts and rare policy switches with adversarial contexts. For both these models, we establish essentially tight regret bounds. Notably, in the obtained bounds, we manage to eliminate a dependence on a key parameter $\kappa$ , which captures the non-linearity of the underlying reward model. For our batch learning algorithm B-GLinCB, with $\Omega\left( \log{\log T} \right)$ batches, the regret scales as $\tilde{O}(\sqrt{T})$ . Further, we establish that our rarely switching algorithm RS-GLinCB updates its policy at most $\tilde{O}(\log^2 T)$ times and achieves a regret of $\tilde{O}(\sqrt{T})$ . Our approach for removing the dependence on $\kappa$ for generalized linear contextual bandits might be of independent interest.

View on arXiv

Comments on this paper