Optimization of Epsilon-Greedy Exploration

3 June 2025

Main:10 Pages

2 Figures

Bibliography:1 Pages

2 Tables

Abstract

Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommendations and how this should evolve over time. While various heuristics exist for navigating the resulting exploration-exploitation tradeoff, selecting optimal exploration rates is complicated by practical constraints including batched updates, time-varying user traffic, short time horizons, and minimum exploration requirements. In this work, we propose a principled framework for determining the exploration schedule based on directly minimizing Bayesian regret through stochastic gradient descent (SGD), allowing for dynamic exploration rate adjustment via Model-Predictive Control (MPC). Through extensive experiments with recommendation datasets, we demonstrate that variations in the batch size across periods significantly influence the optimal exploration strategy. Our optimization methods automatically calibrate exploration to the specific problem setting, consistently matching or outperforming the best heuristic for each setting.

View on arXiv

@article{che2025_2506.03324,
  title={ Optimization of Epsilon-Greedy Exploration },
  author={ Ethan Che and Hakan Ceylan and James McInerney and Nathan Kallus },
  journal={arXiv preprint arXiv:2506.03324},
  year={ 2025 }
}

Comments on this paper