A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

18 July 2016

Papers citing "A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward"

5 / 5 papers shown

Title
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning Wenzhuo Zhou Ruoqing Zhu A. Qu 19 22 0 20 Oct 2021
Batch Policy Learning in Average Reward Markov Decision Processes Peng Liao Zhengling Qi Runzhe Wan P. Klasnja S. Murphy OffRL 10 81 0 23 Jul 2020
Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health Peng Liao P. Klasnja S. Murphy OffRL 11 66 0 30 Dec 2019
Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions Feiyun Zhu Jun Guo Ruoyu Li Junzhou Huang OffRL 17 16 0 27 Feb 2018
$Robust Contextual Bandit via the Capped-$\ell_{2}$ norm$ Robust Contextual Bandit via the Capped- $\ell_{2}$ norm Feiyun Zhu Xinliang Zhu Sheng Wang Jiawen Yao Junzhou Huang OffRL 27 1 0 17 Aug 2017