ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.12340
20
41

Bandit Convex Optimization in Non-stationary Environments

29 July 2019
Peng Zhao
G. Wang
Lijun Zhang
Zhi-Hua Zhou
ArXivPDFHTML
Abstract

Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we investigate BCO in non-stationary environments and choose the \emph{dynamic regret} as the performance measure, which is defined as the difference between the cumulative loss incurred by the algorithm and that of any feasible comparator sequence. Let TTT be the time horizon and PTP_TPT​ be the path-length of the comparator sequence that reflects the non-stationarity of environments. We propose a novel algorithm that achieves O(T3/4(1+PT)1/2)O(T^{3/4}(1+P_T)^{1/2})O(T3/4(1+PT​)1/2) and O(T1/2(1+PT)1/2)O(T^{1/2}(1+P_T)^{1/2})O(T1/2(1+PT​)1/2) dynamic regret respectively for the one-point and two-point feedback models. The latter result is optimal, matching the Ω(T1/2(1+PT)1/2)\Omega(T^{1/2}(1+P_T)^{1/2})Ω(T1/2(1+PT​)1/2) lower bound established in this paper. Notably, our algorithm is more adaptive to non-stationary environments since it does not require prior knowledge of the path-length PTP_TPT​ ahead of time, which is generally unknown.

View on arXiv
Comments on this paper