A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average
  Reward

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

Papers citing "A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward"