Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1503.09105
Cited By
v1
v2
v3
v4
v5
v6
v7
v8
v9
v10
v11
v12
v13
v14 (latest)
Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning
31 March 2015
Prasenjit Karmakar
S. Bhatnagar
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning"
20 / 20 papers shown
Title
Autonomous Curriculum Design via Relative Entropy Based Task Modifications
Muhammed Yusuf Satici
Jianxun Wang
David L. Roberts
71
0
0
28 Feb 2025
Bridging Physics-Informed Neural Networks with Reinforcement Learning: Hamilton-Jacobi-Bellman Proximal Policy Optimization (HJBPPO)
Amartya Mukherjee
Jun Liu
60
11
0
01 Feb 2023
Finite-Time Error Bounds for Greedy-GQ
Yue Wang
Yi Zhou
Shaofeng Zou
98
2
0
06 Sep 2022
Schedule Based Temporal Difference Algorithms
Rohan Deb
Meet Gandhi
S. Bhatnagar
23
0
0
23 Nov 2021
Gradient Temporal Difference with Momentum: Stability and Convergence
Rohan Deb
S. Bhatnagar
43
5
0
22 Nov 2021
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
Vivek Borkar
Shuhang Chen
Adithya M. Devraj
Ioannis Kontoyiannis
Sean P. Meyn
68
32
0
27 Oct 2021
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning
Sihan Zeng
Thinh T. Doan
Justin Romberg
163
26
0
29 Sep 2021
Bayesian Bellman Operators
M. Fellows
Kristian Hartikainen
Shimon Whiteson
OffRL
81
17
0
09 Jun 2021
Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER
Markus Holzleitner
Lukas Gruber
Jose A. Arjona-Medina
Johannes Brandstetter
Sepp Hochreiter
69
39
0
02 Dec 2020
Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms
Tengyu Xu
Yingbin Liang
93
26
0
10 Nov 2020
Two-Timescale Stochastic Gradient Descent in Continuous Time with Applications to Joint Online Parameter Estimation and Optimal Sensor Placement
Louis Sharrock
N. Kantas
64
7
0
31 Jul 2020
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
Mingyi Hong
Hoi-To Wai
Zhaoran Wang
Zhuoran Yang
86
140
0
10 Jul 2020
Whittle index based Q-learning for restless bandits with average reward
Konstantin Avrachenkov
Vivek Borkar
69
70
0
29 Apr 2020
A Game Theoretic Framework for Model Based Reinforcement Learning
Aravind Rajeswaran
Igor Mordatch
Vikash Kumar
OffRL
58
128
0
16 Apr 2020
Zap Q-Learning With Nonlinear Function Approximation
Shuhang Chen
Adithya M. Devraj
Fan Lu
Ana Bušić
Sean P. Meyn
67
20
0
11 Oct 2019
Generative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991)
J. Schmidhuber
GAN
DRL
110
57
0
11 Jun 2019
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
Huizhen Yu
OffRL
99
32
0
27 Dec 2017
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
M. Heusel
Hubert Ramsauer
Thomas Unterthiner
Bernhard Nessler
Sepp Hochreiter
95
465
0
26 Jun 2017
On Generalized Bellman Equations and Temporal-Difference Learning
Huizhen Yu
A. R. Mahmood
R. Sutton
118
29
0
14 Apr 2017
Multi-step Off-policy Learning Without Importance Sampling Ratios
A. R. Mahmood
Huizhen Yu
R. Sutton
OffRL
143
54
0
09 Feb 2017
1