ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.03324
25
0

Optimization of Epsilon-Greedy Exploration

3 June 2025
Ethan Che
Hakan Ceylan
James McInerney
Nathan Kallus
ArXiv (abs)PDFHTML
Main:10 Pages
2 Figures
Bibliography:1 Pages
2 Tables
Abstract

Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommendations and how this should evolve over time. While various heuristics exist for navigating the resulting exploration-exploitation tradeoff, selecting optimal exploration rates is complicated by practical constraints including batched updates, time-varying user traffic, short time horizons, and minimum exploration requirements. In this work, we propose a principled framework for determining the exploration schedule based on directly minimizing Bayesian regret through stochastic gradient descent (SGD), allowing for dynamic exploration rate adjustment via Model-Predictive Control (MPC). Through extensive experiments with recommendation datasets, we demonstrate that variations in the batch size across periods significantly influence the optimal exploration strategy. Our optimization methods automatically calibrate exploration to the specific problem setting, consistently matching or outperforming the best heuristic for each setting.

View on arXiv
@article{che2025_2506.03324,
  title={ Optimization of Epsilon-Greedy Exploration },
  author={ Ethan Che and Hakan Ceylan and James McInerney and Nathan Kallus },
  journal={arXiv preprint arXiv:2506.03324},
  year={ 2025 }
}
Comments on this paper