ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.07911
  4. Cited By
Delay-Adapted Policy Optimization and Improved Regret for Adversarial
  MDP with Delayed Bandit Feedback

Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

13 May 2023
Tal Lancewicki
Aviv A. Rosenberg
Dmitry Sotnikov
ArXivPDFHTML

Papers citing "Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback"

8 / 8 papers shown
Title
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
14
3
0
13 May 2024
A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
Saeed Masoudian
Julian Zimmert
Yevgeny Seldin
18
18
0
29 Jun 2022
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes
  with Bandit Feedback
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback
Yan Dai
Haipeng Luo
Liyu Chen
52
19
0
26 May 2022
Computationally Efficient Horizon-Free Reinforcement Learning for Linear
  Mixture MDPs
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs
Dongruo Zhou
Quanquan Gu
73
43
0
23 May 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
59
21
0
31 Jan 2022
Cooperative Online Learning in Stochastic and Adversarial MDPs
Cooperative Online Learning in Stochastic and Adversarial MDPs
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
56
3
0
31 Jan 2022
Nonstochastic Bandits with Composite Anonymous Feedback
Nonstochastic Bandits with Composite Anonymous Feedback
Nicolò Cesa-Bianchi
Tommaso Cesari
Roberto Colomboni
Claudio Gentile
Yishay Mansour
72
39
0
06 Dec 2021
Regret Bounds for Stochastic Shortest Path Problems with Linear Function
  Approximation
Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation
Daniel Vial
Advait Parulekar
Sanjay Shakkottai
R. Srikant
24
15
0
04 May 2021
1