ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.14843
  4. Cited By
Learning Adversarial Markov Decision Processes with Delayed Feedback
v1v2v3 (latest)

Learning Adversarial Markov Decision Processes with Delayed Feedback

AAAI Conference on Artificial Intelligence (AAAI), 2020
29 December 2020
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
ArXiv (abs)PDFHTML

Papers citing "Learning Adversarial Markov Decision Processes with Delayed Feedback"

30 / 30 papers shown
Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback
Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback
Orin Levy
Liad Erez
Alon Cohen
Yishay Mansour
144
2
0
10 Oct 2025
Exploiting Curvature in Online Convex Optimization with Delayed Feedback
Exploiting Curvature in Online Convex Optimization with Delayed Feedback
Hao Qiu
Emmanuel Esposito
Mengxiao Zhang
221
6
0
09 Jun 2025
Identifying Predictions That Influence the Future: Detecting Performative Concept Drift in Data Streams
Identifying Predictions That Influence the Future: Detecting Performative Concept Drift in Data StreamsAAAI Conference on Artificial Intelligence (AAAI), 2024
Brandon Gower-Winter
Georg Krempl
Sergey Dragomiretskiy
Tineke Jelsma
Arno Siebes
484
3
0
13 Dec 2024
Biased Dueling Bandits with Stochastic Delayed Feedback
Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi
Yue Kang
Yao Li
442
3
0
26 Aug 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Jean-Michel Poggi
389
1
0
08 Jul 2024
Warm-up Free Policy Optimization: Improved Regret in Linear Markov
  Decision Processes
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
366
5
0
03 Jul 2024
Non-stochastic Bandits With Evolving Observations
Non-stochastic Bandits With Evolving Observations
Yogev Bar-On
Yishay Mansour
325
3
0
27 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Near-Optimal Regret in Linear MDPs with Aggregate Bandit FeedbackInternational Conference on Machine Learning (ICML), 2024
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
378
6
0
13 May 2024
Posterior Sampling with Delayed Feedback for Reinforcement Learning with
  Linear Function Approximation
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function ApproximationNeural Information Processing Systems (NeurIPS), 2023
Nikki Lijing Kuang
Ming Yin
Mengdi Wang
Yu Wang
Yian Ma
364
7
0
29 Oct 2023
Statistical Inference on Multi-armed Bandits with Delayed Feedback
Statistical Inference on Multi-armed Bandits with Delayed FeedbackInternational Conference on Machine Learning (ICML), 2023
Lei Shi
Jingshen Wang
Tianhao Wu
357
7
0
03 Jul 2023
Online Resource Allocation in Episodic Markov Decision Processes
Online Resource Allocation in Episodic Markov Decision Processes
Duksang Lee
William Overman
Dabeen Lee
399
1
0
18 May 2023
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial
  Semi-Bandits, Linear Bandits, and MDPs
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPsAnnual Conference Computational Learning Theory (COLT), 2023
Dirk van der Hoeven
Lukas Zierahn
Tal Lancewicki
Aviv A. Rosenberg
Nicolò Cesa-Bianchi
350
13
0
15 May 2023
Delay-Adapted Policy Optimization and Improved Regret for Adversarial
  MDP with Delayed Bandit Feedback
Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit FeedbackInternational Conference on Machine Learning (ICML), 2023
Tal Lancewicki
Aviv A. Rosenberg
Dmitry Sotnikov
217
6
0
13 May 2023
Reinforcement Learning with Delayed, Composite, and Partially Anonymous
  Reward
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
Washim Uddin Mondal
Vaneet Aggarwal
287
3
0
04 May 2023
Safe Networked Robotics with Probabilistic Verification
Safe Networked Robotics with Probabilistic VerificationIEEE Robotics and Automation Letters (RA-L), 2023
Sai Shankar Narasimhan
Sharachchandra Bhat
Sandeep Chinchali
348
3
0
17 Feb 2023
A Reduction-based Framework for Sequential Decision Making with Delayed
  Feedback
A Reduction-based Framework for Sequential Decision Making with Delayed FeedbackNeural Information Processing Systems (NeurIPS), 2023
Yunchang Yang
Hangshi Zhong
Tianhao Wu
B. Liu
Liwei Wang
S. Du
OffRL
599
10
0
03 Feb 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear
  Function Approximation
Improved Regret for Efficient Online Reinforcement Learning with Linear Function ApproximationInternational Conference on Machine Learning (ICML), 2023
Uri Sherman
Tomer Koren
Yishay Mansour
378
14
0
30 Jan 2023
Banker Online Mirror Descent: A Universal Approach for Delayed Online
  Bandit Learning
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit LearningInternational Conference on Machine Learning (ICML), 2023
Jiatai Huang
Yan Dai
Longbo Huang
411
7
0
25 Jan 2023
Multi-Agent Reinforcement Learning with Reward Delays
Multi-Agent Reinforcement Learning with Reward DelaysConference on Learning for Dynamics & Control (L4DC), 2022
Yuyang Zhang
Runyu Zhang
Yu Gu
Na Li
278
14
0
02 Dec 2022
Incrementality Bidding via Reinforcement Learning under Mixed and
  Delayed Rewards
Incrementality Bidding via Reinforcement Learning under Mixed and Delayed RewardsNeural Information Processing Systems (NeurIPS), 2022
Ashwinkumar Badanidiyuru
Zhe Feng
Tianxi Li
Haifeng Xu
OffRL
375
4
0
02 Jun 2022
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes
  with Bandit Feedback
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit FeedbackNeural Information Processing Systems (NeurIPS), 2022
Yan Dai
Haipeng Luo
Liyu Chen
309
21
0
26 May 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Near-Optimal Regret for Adversarial MDP with Delayed Bandit FeedbackNeural Information Processing Systems (NeurIPS), 2022
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
305
25
0
31 Jan 2022
Cooperative Online Learning in Stochastic and Adversarial MDPs
Cooperative Online Learning in Stochastic and Adversarial MDPsInternational Conference on Machine Learning (ICML), 2022
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
381
4
0
31 Jan 2022
Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
Nearly Optimal Policy Optimization with Stable at Any Time GuaranteeInternational Conference on Machine Learning (ICML), 2021
Tianhao Wu
Yunchang Yang
Han Zhong
Liwei Wang
S. Du
Jiantao Jiao
474
15
0
21 Dec 2021
Optimism and Delays in Episodic Reinforcement Learning
Optimism and Delays in Episodic Reinforcement LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Benjamin Howson
Ciara Pike-Burke
Sarah Filippi
262
8
0
15 Nov 2021
Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays
Jiatai Huang
Yan Dai
Longbo Huang
AI4CE
387
2
0
26 Oct 2021
Reinforcement Learning for Feedback-Enabled Cyber Resilience
Reinforcement Learning for Feedback-Enabled Cyber Resilience
Yunhan Huang
Linan Huang
Quanyan Zhu
336
97
0
02 Jul 2021
Minimax Regret for Stochastic Shortest Path
Minimax Regret for Stochastic Shortest PathNeural Information Processing Systems (NeurIPS), 2021
Alon Cohen
Yonathan Efroni
Yishay Mansour
Aviv A. Rosenberg
413
31
0
24 Mar 2021
No Weighted-Regret Learning in Adversarial Bandits with Delays
No Weighted-Regret Learning in Adversarial Bandits with DelaysJournal of machine learning research (JMLR), 2021
Ilai Bistritz
Zhengyuan Zhou
Xi Chen
Nicholas Bambos
Jose H. Blanchet
346
13
0
08 Mar 2021
Non-stationary Reinforcement Learning without Prior Knowledge: An
  Optimal Black-box Approach
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box ApproachAnnual Conference Computational Learning Theory (COLT), 2021
Chen-Yu Wei
Haipeng Luo
OffRL
515
129
0
10 Feb 2021
1
Page 1 of 1