We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develops a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational cost: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a \emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from to with respect to current round and the time horizon , and achieves a near-optimal and variance-aware regret of order where is the dimension and is the -th central moment of reward at round .
View on arXiv@article{wang2025_2503.00419, title={ Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update }, author={ Jing Wang and Yu-Jie Zhang and Peng Zhao and Zhi-Hua Zhou }, journal={arXiv preprint arXiv:2503.00419}, year={ 2025 } }