Exploration via Feature Perturbation in Contextual Bandits
- AAML
Main:9 Pages
10 Figures
Bibliography:4 Pages
2 Tables
Appendix:21 Pages
Abstract
We propose feature perturbation, a simple yet powerful technique that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves worst-case regret bound for generalized linear bandits, while avoiding the regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with best-known theoretical guarantees.
View on arXivComments on this paper
