Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1607.05047
Cited By
A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward
18 July 2016
S. Murphy
Yanzhen Deng
Eric B. Laber
H. Maei
R. Sutton
K. Witkiewitz
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward"
5 / 5 papers shown
Title
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Wenzhuo Zhou
Ruoqing Zhu
A. Qu
19
22
0
20 Oct 2021
Batch Policy Learning in Average Reward Markov Decision Processes
Peng Liao
Zhengling Qi
Runzhe Wan
P. Klasnja
S. Murphy
OffRL
10
81
0
23 Jul 2020
Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health
Peng Liao
P. Klasnja
S. Murphy
OffRL
11
66
0
30 Dec 2019
Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions
Feiyun Zhu
Jun Guo
Ruoyu Li
Junzhou Huang
OffRL
17
16
0
27 Feb 2018
Robust Contextual Bandit via the Capped-
ℓ
2
\ell_{2}
ℓ
2
norm
Feiyun Zhu
Xinliang Zhu
Sheng Wang
Jiawen Yao
Junzhou Huang
OffRL
27
1
0
17 Aug 2017
1