ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.09056
  4. Cited By
Slowly Changing Adversarial Bandit Algorithms are Efficient for
  Discounted MDPs

Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs

18 May 2022
Ian A. Kash
L. Reyzin
Zishun Yu
ArXivPDFHTML

Papers citing "Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs"

6 / 6 papers shown
Title
Reinforcement Learning with Delayed, Composite, and Partially Anonymous
  Reward
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
Washim Uddin Mondal
Vaneet Aggarwal
14
2
0
04 May 2023
On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs
On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs
Zixuan Dong
Che Wang
Keith Ross
18
3
0
07 Sep 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
59
21
0
31 Jan 2022
Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic
  Games with Independent Chains
Learning Stationary Nash Equilibrium Policies in nnn-Player Stochastic Games with Independent Chains
S. Rasoul Etesami
8
6
0
28 Jan 2022
UCB Momentum Q-learning: Correcting the bias without forgetting
UCB Momentum Q-learning: Correcting the bias without forgetting
Pierre Menard
O. D. Domingues
Xuedong Shang
Michal Valko
72
40
0
01 Mar 2021
Bounded regret in stochastic multi-armed bandits
Bounded regret in stochastic multi-armed bandits
Sébastien Bubeck
Vianney Perchet
Philippe Rigollet
56
90
0
06 Feb 2013
1