On the optimal regret of collaborative personalized linear bandits

Stochastic linear bandits are a fundamental model for sequential decision making, where an agent selects a vector-valued action and receives a noisy reward with expected value given by an unknown linear function. Although well studied in the single-agent setting, many real-world scenarios involve multiple agents solving heterogeneous bandit problems, each with a different unknown parameter. Applying single agent algorithms independently ignores cross-agent similarity and learning opportunities. This paper investigates the optimal regret achievable in collaborative personalized linear bandits. We provide an information-theoretic lower bound that characterizes how the number of agents, the interaction rounds, and the degree of heterogeneity jointly affect regret. We then propose a new two-stage collaborative algorithm that achieves the optimal regret. Our analysis models heterogeneity via a hierarchical Bayesian framework and introduces a novel information-theoretic technique for bounding regret. Our results offer a complete characterization of when and how collaboration helps with a optimal regret bound , , for the number of rounds in the range of , and respectively, where measures the level of heterogeneity, is the number of agents, and is an absolute constant. In contrast, agents without collaboration achieve a regret bound at best.
View on arXiv@article{huang2025_2506.15943, title={ On the optimal regret of collaborative personalized linear bandits }, author={ Bruce Huang and Ruida Zhou and Lin F. Yang and Suhas Diggavi }, journal={arXiv preprint arXiv:2506.15943}, year={ 2025 } }