31
0

Truncated LinUCB for Stochastic Linear Bandits

Abstract

This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed dd-dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension dd and time horizon TT, due to its over-exploration. A truncated version of LinUCB is proposed and termed "Tr-LinUCB", which follows LinUCB up to a truncation time SS and performs pure exploitation afterwards. The Tr-LinUCB algorithm is shown to achieve O(dlog(T))O(d\log(T)) regret if S=Cdlog(T)S = Cd\log(T) for a sufficiently large constant CC, and a matching lower bound is established, which shows the rate optimality of Tr-LinUCB in both dd and TT under a low dimensional regime. Further, if S=dlogκ(T)S = d\log^{\kappa}(T) for some κ>1\kappa>1, the loss compared to the optimal is a multiplicative loglog(T)\log\log(T) factor, which does not depend on dd. This insensitivity to overshooting in choosing the truncation time of Tr-LinUCB is of practical importance.

View on arXiv
@article{song2025_2202.11735,
  title={ Truncated LinUCB for Stochastic Linear Bandits },
  author={ Yanglei Song and Meng zhou },
  journal={arXiv preprint arXiv:2202.11735},
  year={ 2025 }
}
Comments on this paper