Minimax Optimal Submodular Optimization with Bandit Feedback
Neural Information Processing Systems (NeurIPS), 2023
Abstract
We consider maximizing a monotonic, submodular set function under stochastic bandit feedback. Specifically, is unknown to the learner but at each time the learner chooses a set with and receives reward where is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret over times with respect to ()-approximation of maximum with , obtained through greedy maximization of . To date, the best regret bound in the literature scales as . And by trivially treating every set as a unique arm one deduces that is also achievable. In this work, we establish the first minimax lower bound for this setting that scales like . Moreover, we propose an algorithm that is capable of matching the lower bound regret.
View on arXivComments on this paper
