ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.03024
22
132

Learning to Optimize under Non-Stationarity

6 October 2018
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
ArXivPDFHTML
Abstract

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining d,BT,d,B_T,d,BT​, and TTT as the problem dimension, the \emph{variation budget}, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (\texttt{SW-UCB}) algorithm with optimal O~(d2/3(BT+1)1/3T2/3)\widetilde{O}(d^{2/3}(B_T+1)^{1/3}T^{2/3})O(d2/3(BT​+1)1/3T2/3) dynamic regret, and the tuning free bandit-over-bandit (\texttt{BOB}) framework built on top of the \texttt{SW-UCB} algorithm with best O~(d2/3(BT+1)1/4T3/4)\widetilde{O}(d^{2/3}(B_T+1)^{1/4}T^{3/4})O(d2/3(BT​+1)1/4T3/4) dynamic regret.

View on arXiv
Comments on this paper