Weak Convergence Properties of Constrained Emphatic Temporal-difference
Learning with Constant and Slowly Diminishing Stepsize

v1v2v3 (latest)

Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize

23 November 2015

Huizhen Yu

ArXiv (abs)PDF HTML

Papers citing "Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize"

11 / 11 papers shown

Title
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features Zixuan Xie Xinyu Liu Rohan Chandra Shangtong Zhang 47 0 0 27 May 2025
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning Wenzhuo Zhou Ruoqing Zhu Annie Qu 79 22 0 20 Oct 2021
A Study of Policy Gradient on a Class of Exactly Solvable Models Gavin McCracken Colin Daniels Rosie Zhao Anna M. Brandenberger Prakash Panangaden Doina Precup 36 0 0 03 Nov 2020
Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning M. Stanković M. Beko S. Stankovic OffRL 10 16 0 18 Jun 2020
Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence Abhishek Gupta W. Haskell 13 5 0 25 Mar 2020
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning Huizhen Yu OffRL 99 32 0 27 Dec 2017
On Generalized Bellman Equations and Temporal-Difference Learning Huizhen Yu A. R. Mahmood R. Sutton 118 29 0 14 Apr 2017
Multi-step Off-policy Learning Without Importance Sampling Ratios A. R. Mahmood Huizhen Yu R. Sutton OffRL 143 54 0 09 Feb 2017
Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms Huizhen Yu 44 2 0 06 May 2016
Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning Prasenjit Karmakar S. Bhatnagar 89 27 0 31 Mar 2015
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning R. Sutton A. R. Mahmood Martha White 103 272 0 14 Mar 2015