23

Stochastic Bandits for Crowdsourcing and Multi-Platform Autobidding

François Bachoc
Nicolò Cesa-Bianchi
Tommaso Cesari
Roberto Colomboni
Main:11 Pages
Bibliography:3 Pages
Appendix:29 Pages
Abstract

Motivated by applications in crowdsourcing, where a fixed sum of money is split among KK workers, and autobidding, where a fixed budget is used to bid in KK simultaneous auctions, we define a stochastic bandit model where arms belong to the KK-dimensional probability simplex and represent the fraction of budget allocated to each task/auction. The reward in each round is the sum of KK stochastic rewards, where each of these rewards is unlocked with a probability that varies with the fraction of the budget allocated to that task/auction. We design an algorithm whose expected regret after TT steps is of order KTK\sqrt{T} (up to log factors) and prove a matching lower bound. Improved bounds of order K(logT)2K (\log T)^2 are shown when the function mapping budget to probability of unlocking the reward (i.e., terminating the task or winning the auction) satisfies additional diminishing-returns conditions.

View on arXiv
Comments on this paper