5
0

On the optimal regret of collaborative personalized linear bandits

Main:28 Pages
4 Figures
Bibliography:2 Pages
Abstract

Stochastic linear bandits are a fundamental model for sequential decision making, where an agent selects a vector-valued action and receives a noisy reward with expected value given by an unknown linear function. Although well studied in the single-agent setting, many real-world scenarios involve multiple agents solving heterogeneous bandit problems, each with a different unknown parameter. Applying single agent algorithms independently ignores cross-agent similarity and learning opportunities. This paper investigates the optimal regret achievable in collaborative personalized linear bandits. We provide an information-theoretic lower bound that characterizes how the number of agents, the interaction rounds, and the degree of heterogeneity jointly affect regret. We then propose a new two-stage collaborative algorithm that achieves the optimal regret. Our analysis models heterogeneity via a hierarchical Bayesian framework and introduces a novel information-theoretic technique for bounding regret. Our results offer a complete characterization of when and how collaboration helps with a optimal regret bound O~(dmn)\tilde{O}(d\sqrt{mn}), O~(dm1γn)\tilde{O}(dm^{1-\gamma}\sqrt{n}), O~(dmn)\tilde{O}(dm\sqrt{n}) for the number of rounds nn in the range of (0,dmσ2)(0, \frac{d}{m \sigma^2}), [dm2γσ2,dσ2][\frac{d}{m^{2\gamma} \sigma^2}, \frac{d}{\sigma^2}] and (dσ2,)(\frac{d}{\sigma^2}, \infty) respectively, where σ\sigma measures the level of heterogeneity, mm is the number of agents, and γ[0,1/2]\gamma\in[0, 1/2] is an absolute constant. In contrast, agents without collaboration achieve a regret bound O(dmn)O(dm\sqrt{n}) at best.

View on arXiv
@article{huang2025_2506.15943,
  title={ On the optimal regret of collaborative personalized linear bandits },
  author={ Bruce Huang and Ruida Zhou and Lin F. Yang and Suhas Diggavi },
  journal={arXiv preprint arXiv:2506.15943},
  year={ 2025 }
}
Comments on this paper