359
v1v2 (latest)

Online Learning with Probing for Sequential User-Centric Selection

Main:7 Pages
2 Figures
Bibliography:1 Pages
1 Tables
Appendix:11 Pages
Abstract

We formalize sequential decision-making with information acquisition as the probing-augmented user-centric selection (PUCS) framework, where a learner first probes a subset of arms to obtain side information on resources and rewards, and then assigns KK plays to MM arms. PUCS covers applications such as ridesharing, wireless scheduling, and content recommendation, in which both resources and payoffs are initially unknown and probing is costly. For the offline setting with known distributions, we present a greedy probing algorithm with a constant-factor approximation guarantee ζ=(e1)/(2e1)\zeta = (e-1)/(2e-1). For the online setting with unknown distributions, we introduce OLPA, a stochastic combinatorial bandit algorithm that achieves a regret bound O(T+ln2T)\mathcal{O}(\sqrt{T} + \ln^{2} T). We also prove a lower bound Ω(T)\Omega(\sqrt{T}), showing that the upper bound is tight up to logarithmic factors. Experiments on real-world data demonstrate the effectiveness of our solutions.

View on arXiv
Comments on this paper