ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.05047
  4. Cited By
A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average
  Reward

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

18 July 2016
S. Murphy
Yanzhen Deng
Eric B. Laber
H. Maei
R. Sutton
K. Witkiewitz
    OffRL
ArXivPDFHTML

Papers citing "A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward"

5 / 5 papers shown
Title
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via
  pT-Learning
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Wenzhuo Zhou
Ruoqing Zhu
A. Qu
21
22
0
20 Oct 2021
Batch Policy Learning in Average Reward Markov Decision Processes
Batch Policy Learning in Average Reward Markov Decision Processes
Peng Liao
Zhengling Qi
Runzhe Wan
P. Klasnja
S. Murphy
OffRL
10
81
0
23 Jul 2020
Off-Policy Estimation of Long-Term Average Outcomes with Applications to
  Mobile Health
Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health
Peng Liao
P. Klasnja
S. Murphy
OffRL
11
66
0
30 Dec 2019
Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth)
  Interventions
Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions
Feiyun Zhu
Jun Guo
Ruoyu Li
Junzhou Huang
OffRL
17
16
0
27 Feb 2018
Robust Contextual Bandit via the Capped-$\ell_{2}$ norm
Robust Contextual Bandit via the Capped-ℓ2\ell_{2}ℓ2​ norm
Feiyun Zhu
Xinliang Zhu
Sheng Wang
Jiawen Yao
Junzhou Huang
OffRL
27
1
0
17 Aug 2017
1