Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

13 May 2023

Papers citing "Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback"

8 / 8 papers shown

Title
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback Asaf B. Cassel Haipeng Luo Aviv A. Rosenberg Dmitry Sotnikov OffRL 14 3 0 13 May 2024
A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback Saeed Masoudian Julian Zimmert Yevgeny Seldin 18 18 0 29 Jun 2022
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback Yan Dai Haipeng Luo Liyu Chen 52 19 0 26 May 2022
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs Dongruo Zhou Quanquan Gu 73 43 0 23 May 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback Tiancheng Jin Tal Lancewicki Haipeng Luo Yishay Mansour Aviv A. Rosenberg 59 21 0 31 Jan 2022
Cooperative Online Learning in Stochastic and Adversarial MDPs Tal Lancewicki Aviv A. Rosenberg Yishay Mansour 56 3 0 31 Jan 2022
Nonstochastic Bandits with Composite Anonymous Feedback Nicolò Cesa-Bianchi Tommaso Cesari Roberto Colomboni Claudio Gentile Yishay Mansour 72 39 0 06 Dec 2021
Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation Daniel Vial Advait Parulekar Sanjay Shakkottai R. Srikant 24 15 0 04 May 2021