386
v1v2 (latest)

Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update

Main:8 Pages
5 Figures
Bibliography:3 Pages
1 Tables
Appendix:12 Pages
Abstract

We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develops a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational costs: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a \emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from O(tlogT)\mathcal{O}(t \log T) to O(1)\mathcal{O}(1) with respect to current round tt and the time horizon TT, and achieves a near-optimal and variance-aware regret of order O~(dT1ϵ2(1+ϵ)t=1Tνt2+dT1ϵ2(1+ϵ))\widetilde{\mathcal{O}}\big(d T^{\frac{1-\epsilon}{2(1+\epsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\epsilon}{2(1+\epsilon)}}\big) where dd is the dimension and νt1+ϵ\nu_t^{1+\epsilon} is the (1+ϵ)(1+\epsilon)-th central moment of reward at round tt.

View on arXiv
Comments on this paper