Deterministic MDPs with Adversarial Rewards and Bandit Feedback

16 October 2012

Papers citing "Deterministic MDPs with Adversarial Rewards and Bandit Feedback"

14 / 14 papers shown

Title
Lower Bound on Howard Policy Iteration for Deterministic Markov Decision Processes Ali Asadi Krishnendu Chatterjee Jakob de Raaij 5 0 0 13 Jun 2025
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback Canzhe Zhao Ruofeng Yang Baoxiang Wang Xuezhou Zhang Shuai Li 67 3 0 14 Nov 2023
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes Réda Alami Mohammed Mahfoud Eric Moulines 60 3 0 01 Apr 2023
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback Yan Dai Haipeng Luo Liyu Chen 110 19 0 26 May 2022
Reductive MDPs: A Perspective Beyond Temporal Horizons Thomas Spooner Rui Silva J. Lockhart Jason Long Vacslav Glukhov 43 0 0 15 May 2022
Learning in Online MDPs: Is there a Price for Handling the Communicating Case? Gautam Chandrasekaran Ambuj Tewari 37 1 0 03 Nov 2021
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach Chen-Yu Wei Haipeng Luo OffRL 183 107 0 10 Feb 2021
Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control Weichao Mao Kai Zhang Ruihao Zhu D. Simchi-Levi Tamer Bacsar 78 13 0 07 Oct 2020
Dynamic Regret of Policy Optimization in Non-stationary Environments Yingjie Fei Zhuoran Yang Zhaoran Wang Qiaomin Xie 91 56 0 30 Jun 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism Wang Chi Cheung D. Simchi-Levi Ruihao Zhu OffRL 95 96 0 24 Jun 2020
Corralling Stochastic Bandit Algorithms R. Arora T. V. Marinov M. Mohri 115 35 0 16 Jun 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition Chi Jin Tiancheng Jin Haipeng Luo S. Sra Tiancheng Yu 86 105 0 03 Dec 2019
Policy Regret in Repeated Games R. Arora M. Dinitz T. V. Marinov M. Mohri OffRL 49 17 0 09 Nov 2018
Relax but stay in control: from value to algorithms for online Markov decision processes Peng Guan Maxim Raginsky Rebecca Willett 46 2 0 28 Oct 2013