On the optimal regret of collaborative personalized linear bandits

19 June 2025

Main:28 Pages

4 Figures

Bibliography:2 Pages

Abstract

Stochastic linear bandits are a fundamental model for sequential decision making, where an agent selects a vector-valued action and receives a noisy reward with expected value given by an unknown linear function. Although well studied in the single-agent setting, many real-world scenarios involve multiple agents solving heterogeneous bandit problems, each with a different unknown parameter. Applying single agent algorithms independently ignores cross-agent similarity and learning opportunities. This paper investigates the optimal regret achievable in collaborative personalized linear bandits. We provide an information-theoretic lower bound that characterizes how the number of agents, the interaction rounds, and the degree of heterogeneity jointly affect regret. We then propose a new two-stage collaborative algorithm that achieves the optimal regret. Our analysis models heterogeneity via a hierarchical Bayesian framework and introduces a novel information-theoretic technique for bounding regret. Our results offer a complete characterization of when and how collaboration helps with a optimal regret bound $\tilde{O}(d\sqrt{mn})$ , $\tilde{O}(dm^{1-\gamma}\sqrt{n})$ , $\tilde{O}(dm\sqrt{n})$ for the number of rounds $n$ in the range of $(0, \frac{d}{m \sigma^2})$ , $[\frac{d}{m^{2\gamma} \sigma^2}, \frac{d}{\sigma^2}]$ and $(\frac{d}{\sigma^2}, \infty)$ respectively, where $\sigma$ measures the level of heterogeneity, $m$ is the number of agents, and $\gamma\in[0, 1/2]$ is an absolute constant. In contrast, agents without collaboration achieve a regret bound $O(dm\sqrt{n})$ at best.

View on arXiv

@article{huang2025_2506.15943,
  title={ On the optimal regret of collaborative personalized linear bandits },
  author={ Bruce Huang and Ruida Zhou and Lin F. Yang and Suhas Diggavi },
  journal={arXiv preprint arXiv:2506.15943},
  year={ 2025 }
}

Comments on this paper