35
10
v1v2 (latest)

Stochastic Top-KK Subset Bandits with Linear Space and Non-Linear Feedback

Abstract

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best KK out of NN options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses KK out of NN arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen KK arms. The direct use of multi-armed bandit requires choosing among NN-choose-KK options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in NN. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of O~(K12N13T23)\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}}) for a time horizon TT, which is \textit{sub-linear} in all parameters TT, NN, and KK. %When applied to the problem of Social Influence Maximization, the performance of the proposed algorithm surpasses the UCB algorithm and some more sophisticated domain-specific methods.

View on arXiv
Comments on this paper