Stochastic Bandits for Crowdsourcing and Multi-Platform Autobidding

7 August 2025

François Bachoc

Nicolò Cesa-Bianchi

Tommaso Cesari

Roberto Colomboni

ArXiv (abs)PDF HTML Github

Main:11 Pages

Bibliography:3 Pages

Appendix:29 Pages

Abstract

Motivated by applications in crowdsourcing, where a fixed sum of money is split among $K$ workers, and autobidding, where a fixed budget is used to bid in $K$ simultaneous auctions, we define a stochastic bandit model where arms belong to the $K$ -dimensional probability simplex and represent the fraction of budget allocated to each task/auction. The reward in each round is the sum of $K$ stochastic rewards, where each of these rewards is unlocked with a probability that varies with the fraction of the budget allocated to that task/auction. We design an algorithm whose expected regret after $T$ steps is of order $K\sqrt{T}$ (up to log factors) and prove a matching lower bound. Improved bounds of order $K (\log T)^2$ are shown when the function mapping budget to probability of unlocking the reward (i.e., terminating the task or winning the auction) satisfies additional diminishing-returns conditions.

View on arXiv

Comments on this paper