v1v2v3v4v5v6v7v8v9v10v11v12v13v14 (latest)

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

31 March 2015

Papers citing "Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning"

20 / 20 papers shown

Title
Autonomous Curriculum Design via Relative Entropy Based Task Modifications Muhammed Yusuf Satici Jianxun Wang David L. Roberts 71 0 0 28 Feb 2025
Bridging Physics-Informed Neural Networks with Reinforcement Learning: Hamilton-Jacobi-Bellman Proximal Policy Optimization (HJBPPO) Amartya Mukherjee Jun Liu 60 11 0 01 Feb 2023
Finite-Time Error Bounds for Greedy-GQ Yue Wang Yi Zhou Shaofeng Zou 98 2 0 06 Sep 2022
Schedule Based Temporal Difference Algorithms Rohan Deb Meet Gandhi S. Bhatnagar 23 0 0 23 Nov 2021
Gradient Temporal Difference with Momentum: Stability and Convergence Rohan Deb S. Bhatnagar 43 5 0 22 Nov 2021
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning Vivek Borkar Shuhang Chen Adithya M. Devraj Ioannis Kontoyiannis Sean P. Meyn 68 32 0 27 Oct 2021
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning Sihan Zeng Thinh T. Doan Justin Romberg 163 26 0 29 Sep 2021
Bayesian Bellman Operators M. Fellows Kristian Hartikainen Shimon Whiteson OffRL 81 17 0 09 Jun 2021
Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER Markus Holzleitner Lukas Gruber Jose A. Arjona-Medina Johannes Brandstetter Sepp Hochreiter 69 39 0 02 Dec 2020
Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms Tengyu Xu Yingbin Liang 93 26 0 10 Nov 2020
Two-Timescale Stochastic Gradient Descent in Continuous Time with Applications to Joint Online Parameter Estimation and Optimal Sensor Placement Louis Sharrock N. Kantas 64 7 0 31 Jul 2020
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic Mingyi Hong Hoi-To Wai Zhaoran Wang Zhuoran Yang 86 140 0 10 Jul 2020
Whittle index based Q-learning for restless bandits with average reward Konstantin Avrachenkov Vivek Borkar 69 70 0 29 Apr 2020
A Game Theoretic Framework for Model Based Reinforcement Learning Aravind Rajeswaran Igor Mordatch Vikash Kumar OffRL 58 128 0 16 Apr 2020
Zap Q-Learning With Nonlinear Function Approximation Shuhang Chen Adithya M. Devraj Fan Lu Ana Bušić Sean P. Meyn 67 20 0 11 Oct 2019
Generative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991) J. Schmidhuber GAN DRL 110 57 0 11 Jun 2019
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning Huizhen Yu OffRL 99 32 0 27 Dec 2017
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium M. Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler Sepp Hochreiter 95 465 0 26 Jun 2017
On Generalized Bellman Equations and Temporal-Difference Learning Huizhen Yu A. R. Mahmood R. Sutton 118 29 0 14 Apr 2017
Multi-step Off-policy Learning Without Importance Sampling Ratios A. R. Mahmood Huizhen Yu R. Sutton OffRL 143 54 0 09 Feb 2017