99
v1v2 (latest)

Multi-User Contextual Cascading Bandits for Personalized Recommendation

Main:18 Pages
Bibliography:1 Pages
Appendix:16 Pages
Abstract

We introduce a Multi-User Contextual Cascading Bandit model, a new combinatorial bandit framework that captures realistic online advertising scenarios where multiple users interact with sequentially displayed items simultaneously. Unlike classical contextual bandits, MCCB integrates three key structural elements: (i) cascading feedback based on sequential arm exposure, (ii) parallel context sessions enabling selective exploration, and (iii) heterogeneous arm-level rewards. We first propose Upper Confidence Bound with Backward Planning (UCBBP), a UCB-style algorithm tailored to this setting, and prove that it achieves a regret bound of O~(THN)\widetilde{O}(\sqrt{THN}) over TT episodes, HH session steps, and NN contexts per episode. Motivated by the fact that many users interact with the system simultaneously, we introduce a second algorithm, termed Active Upper Confidence Bound with Backward Planning (AUCBBP), which shows a strict efficiency improvement in context scaling, i.e., user scaling, with a regret bound of O~(T+HN)\widetilde{O}(\sqrt{T+HN}). We validate our theoretical findings via numerical experiments, demonstrating the empirical effectiveness of both algorithms under various settings.

View on arXiv
Comments on this paper