220
v1v2 (latest)

Exploration via Feature Perturbation in Contextual Bandits

Main:9 Pages
10 Figures
Bibliography:4 Pages
2 Tables
Appendix:21 Pages
Abstract

We propose feature perturbation, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves O~(dT)\tilde{\mathcal{O}}(d\sqrt{T}) worst-case regret bound for generalized linear contextual bandits, while avoiding the O~(d3/2T)\tilde{\mathcal{O}}(d^{3/2}\sqrt{T}) regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with the near-optimal regret guarantees.

View on arXiv
Comments on this paper