743

Optimal Regret with Limited Adaptivity for Generalized Linear Contextual Bandits

Neural Information Processing Systems (NeurIPS), 2024
Main:9 Pages
1 Figures
Bibliography:2 Pages
Appendix:22 Pages
Abstract

We study the generalized linear contextual bandit problem within the requirements of limited adaptivity. In this paper, we present two algorithms, B-GLinCB and RS-GLinCB, that address, respectively, two prevalent limited adaptivity models: batch learning with stochastic contexts and rare policy switches with adversarial contexts. For both these models, we establish essentially tight regret bounds. Notably, in the obtained bounds, we manage to eliminate a dependence on a key parameter κ\kappa, which captures the non-linearity of the underlying reward model. For our batch learning algorithm B-GLinCB, with Ω(loglogT)\Omega\left( \log{\log T} \right) batches, the regret scales as O~(T)\tilde{O}(\sqrt{T}). Further, we establish that our rarely switching algorithm RS-GLinCB updates its policy at most O~(log2T)\tilde{O}(\log^2 T) times and achieves a regret of O~(T)\tilde{O}(\sqrt{T}). Our approach for removing the dependence on κ\kappa for generalized linear contextual bandits might be of independent interest.

View on arXiv
Comments on this paper