v1v2 (latest)

Online Budget Allocation with Censored Semi-Bandit Feedback

7 August 2025

François Bachoc

Nicolò Cesa-Bianchi

Tommaso Cesari

Roberto Colomboni

ArXiv (abs)PDF HTML Github

Main:11 Pages

Bibliography:3 Pages

Appendix:29 Pages

Abstract

We study a stochastic budget-allocation problem over $K$ tasks. At each round $t$ , the learner chooses an allocation $X_t \in \Delta_K$ . Task $k$ succeeds with probability $F_k(X_{t,k})$ , where $F_1,\dots,F_K$ are nondecreasing budget-to-success curves, and upon success yields a random reward with unknown mean $\mu_k$ . The learner observes which tasks succeed, and observes a task's reward only upon success (censored semi-bandit feedback). This model captures, for instance, splitting payments across crowdsourcing workers or distributing bids across simultaneous auctions, and subsumes stochastic multi-armed bandits and semi-bandits.We design an optimism-based algorithm that operates under censored semi-bandit feedback. Our main result shows that in diminishing-returns regimes, the regret of this algorithm scales polylogarithmically with the horizon $T$ without any ad hoc tuning. For general nondecreasing curves, we prove that the same algorithm (with the same tuning) achieves a worst-case regret upper bound of $\tilde O(K\sqrt{T})$ . Finally, we establish a matching worst-case regret lower bound of $\Omega(K\sqrt{T})$ that holds even for full-feedback algorithms, highlighting the intrinsic hardness of our problem outside diminishing returns.

View on arXiv

Comments on this paper