Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.09056
Cited By
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs
18 May 2022
Ian A. Kash
L. Reyzin
Zishun Yu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs"
6 / 6 papers shown
Title
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
Washim Uddin Mondal
Vaneet Aggarwal
14
2
0
04 May 2023
On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs
Zixuan Dong
Che Wang
Keith Ross
18
3
0
07 Sep 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
59
21
0
31 Jan 2022
Learning Stationary Nash Equilibrium Policies in
n
n
n
-Player Stochastic Games with Independent Chains
S. Rasoul Etesami
8
6
0
28 Jan 2022
UCB Momentum Q-learning: Correcting the bias without forgetting
Pierre Menard
O. D. Domingues
Xuedong Shang
Michal Valko
72
40
0
01 Mar 2021
Bounded regret in stochastic multi-armed bandits
Sébastien Bubeck
Vianney Perchet
Philippe Rigollet
56
90
0
06 Feb 2013
1