Stochastic Bandits for Crowdsourcing and Multi-Platform Autobidding
Motivated by applications in crowdsourcing, where a fixed sum of money is split among workers, and autobidding, where a fixed budget is used to bid in simultaneous auctions, we define a stochastic bandit model where arms belong to the -dimensional probability simplex and represent the fraction of budget allocated to each task/auction. The reward in each round is the sum of stochastic rewards, where each of these rewards is unlocked with a probability that varies with the fraction of the budget allocated to that task/auction. We design an algorithm whose expected regret after steps is of order (up to log factors) and prove a matching lower bound. Improved bounds of order are shown when the function mapping budget to probability of unlocking the reward (i.e., terminating the task or winning the auction) satisfies additional diminishing-returns conditions.
View on arXiv