ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.03580
29
18

Combinatorial Semi-Bandit in the Non-Stationary Environment

10 February 2020
Wei Chen
Liwei Wang
Haoyu Zhao
Kai Zheng
ArXivPDFHTML
Abstract

In this paper, we investigate the non-stationary combinatorial semi-bandit problem, both in the switching case and in the dynamic case. In the general case where (a) the reward function is non-linear, (b) arms may be probabilistically triggered, and (c) only approximate offline oracle exists \cite{wang2017improving}, our algorithm achieves O~(ST)\tilde{\mathcal{O}}(\sqrt{\mathcal{S} T})O~(ST​) distribution-dependent regret in the switching case, and O~(V1/3T2/3)\tilde{\mathcal{O}}(\mathcal{V}^{1/3}T^{2/3})O~(V1/3T2/3) in the dynamic case, where S\mathcal SS is the number of switchings and V\mathcal VV is the sum of the total ``distribution changes''. The regret bounds in both scenarios are nearly optimal, but our algorithm needs to know the parameter S\mathcal SS or V\mathcal VV in advance. We further show that by employing another technique, our algorithm no longer needs to know the parameters S\mathcal SS or V\mathcal VV but the regret bounds could become suboptimal. In a special case where the reward function is linear and we have an exact oracle, we design a parameter-free algorithm that achieves nearly optimal regret both in the switching case and in the dynamic case without knowing the parameters in advance.

View on arXiv
Comments on this paper