ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1210.4843
  4. Cited By
Deterministic MDPs with Adversarial Rewards and Bandit Feedback

Deterministic MDPs with Adversarial Rewards and Bandit Feedback

16 October 2012
R. Arora
O. Dekel
Ambuj Tewari
ArXiv (abs)PDFHTML

Papers citing "Deterministic MDPs with Adversarial Rewards and Bandit Feedback"

14 / 14 papers shown
Title
Lower Bound on Howard Policy Iteration for Deterministic Markov Decision Processes
Ali Asadi
Krishnendu Chatterjee
Jakob de Raaij
5
0
0
13 Jun 2025
Learning Adversarial Low-rank Markov Decision Processes with Unknown
  Transition and Full-information Feedback
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
Canzhe Zhao
Ruofeng Yang
Baoxiang Wang
Xuezhou Zhang
Shuai Li
67
3
0
14 Nov 2023
Restarted Bayesian Online Change-point Detection for Non-Stationary
  Markov Decision Processes
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes
Réda Alami
Mohammed Mahfoud
Eric Moulines
60
3
0
01 Apr 2023
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes
  with Bandit Feedback
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback
Yan Dai
Haipeng Luo
Liyu Chen
110
19
0
26 May 2022
Reductive MDPs: A Perspective Beyond Temporal Horizons
Reductive MDPs: A Perspective Beyond Temporal Horizons
Thomas Spooner
Rui Silva
J. Lockhart
Jason Long
Vacslav Glukhov
43
0
0
15 May 2022
Learning in Online MDPs: Is there a Price for Handling the Communicating
  Case?
Learning in Online MDPs: Is there a Price for Handling the Communicating Case?
Gautam Chandrasekaran
Ambuj Tewari
37
1
0
03 Nov 2021
Non-stationary Reinforcement Learning without Prior Knowledge: An
  Optimal Black-box Approach
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach
Chen-Yu Wei
Haipeng Luo
OffRL
183
107
0
10 Feb 2021
Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in
  Multi-Agent RL and Inventory Control
Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control
Weichao Mao
Kai Zhang
Ruihao Zhu
D. Simchi-Levi
Tamer Bacsar
78
13
0
07 Oct 2020
Dynamic Regret of Policy Optimization in Non-stationary Environments
Dynamic Regret of Policy Optimization in Non-stationary Environments
Yingjie Fei
Zhuoran Yang
Zhaoran Wang
Qiaomin Xie
91
56
0
30 Jun 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The
  Blessing of (More) Optimism
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
95
96
0
24 Jun 2020
Corralling Stochastic Bandit Algorithms
Corralling Stochastic Bandit Algorithms
R. Arora
T. V. Marinov
M. Mohri
115
35
0
16 Jun 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
86
105
0
03 Dec 2019
Policy Regret in Repeated Games
Policy Regret in Repeated Games
R. Arora
M. Dinitz
T. V. Marinov
M. Mohri
OffRL
49
17
0
09 Nov 2018
Relax but stay in control: from value to algorithms for online Markov
  decision processes
Relax but stay in control: from value to algorithms for online Markov decision processes
Peng Guan
Maxim Raginsky
Rebecca Willett
46
2
0
28 Oct 2013
1