ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1503.09105
  4. Cited By
Two Timescale Stochastic Approximation with Controlled Markov noise and
  Off-policy temporal difference learning
v1v2v3v4v5v6v7v8v9v10v11v12v13v14 (latest)

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

31 March 2015
Prasenjit Karmakar
S. Bhatnagar
ArXiv (abs)PDFHTML

Papers citing "Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning"

20 / 20 papers shown
Title
Autonomous Curriculum Design via Relative Entropy Based Task Modifications
Autonomous Curriculum Design via Relative Entropy Based Task Modifications
Muhammed Yusuf Satici
Jianxun Wang
David L. Roberts
71
0
0
28 Feb 2025
Bridging Physics-Informed Neural Networks with Reinforcement Learning:
  Hamilton-Jacobi-Bellman Proximal Policy Optimization (HJBPPO)
Bridging Physics-Informed Neural Networks with Reinforcement Learning: Hamilton-Jacobi-Bellman Proximal Policy Optimization (HJBPPO)
Amartya Mukherjee
Jun Liu
60
11
0
01 Feb 2023
Finite-Time Error Bounds for Greedy-GQ
Finite-Time Error Bounds for Greedy-GQ
Yue Wang
Yi Zhou
Shaofeng Zou
98
2
0
06 Sep 2022
Schedule Based Temporal Difference Algorithms
Schedule Based Temporal Difference Algorithms
Rohan Deb
Meet Gandhi
S. Bhatnagar
23
0
0
23 Nov 2021
Gradient Temporal Difference with Momentum: Stability and Convergence
Gradient Temporal Difference with Momentum: Stability and Convergence
Rohan Deb
S. Bhatnagar
43
5
0
22 Nov 2021
The ODE Method for Asymptotic Statistics in Stochastic Approximation and
  Reinforcement Learning
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
Vivek Borkar
Shuhang Chen
Adithya M. Devraj
Ioannis Kontoyiannis
Sean P. Meyn
68
32
0
27 Oct 2021
A Two-Time-Scale Stochastic Optimization Framework with Applications in
  Control and Reinforcement Learning
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning
Sihan Zeng
Thinh T. Doan
Justin Romberg
163
26
0
29 Sep 2021
Bayesian Bellman Operators
Bayesian Bellman Operators
M. Fellows
Kristian Hartikainen
Shimon Whiteson
OffRL
81
17
0
09 Jun 2021
Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER
Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER
Markus Holzleitner
Lukas Gruber
Jose A. Arjona-Medina
Johannes Brandstetter
Sepp Hochreiter
69
39
0
02 Dec 2020
Sample Complexity Bounds for Two Timescale Value-based Reinforcement
  Learning Algorithms
Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms
Tengyu Xu
Yingbin Liang
93
26
0
10 Nov 2020
Two-Timescale Stochastic Gradient Descent in Continuous Time with
  Applications to Joint Online Parameter Estimation and Optimal Sensor
  Placement
Two-Timescale Stochastic Gradient Descent in Continuous Time with Applications to Joint Online Parameter Estimation and Optimal Sensor Placement
Louis Sharrock
N. Kantas
64
7
0
31 Jul 2020
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
  and Application to Actor-Critic
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
Mingyi Hong
Hoi-To Wai
Zhaoran Wang
Zhuoran Yang
86
140
0
10 Jul 2020
Whittle index based Q-learning for restless bandits with average reward
Whittle index based Q-learning for restless bandits with average reward
Konstantin Avrachenkov
Vivek Borkar
69
70
0
29 Apr 2020
A Game Theoretic Framework for Model Based Reinforcement Learning
A Game Theoretic Framework for Model Based Reinforcement Learning
Aravind Rajeswaran
Igor Mordatch
Vikash Kumar
OffRL
58
128
0
16 Apr 2020
Zap Q-Learning With Nonlinear Function Approximation
Zap Q-Learning With Nonlinear Function Approximation
Shuhang Chen
Adithya M. Devraj
Fan Lu
Ana Bušić
Sean P. Meyn
67
20
0
11 Oct 2019
Generative Adversarial Networks are Special Cases of Artificial
  Curiosity (1990) and also Closely Related to Predictability Minimization
  (1991)
Generative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991)
J. Schmidhuber
GANDRL
110
57
0
11 Jun 2019
On Convergence of some Gradient-based Temporal-Differences Algorithms
  for Off-Policy Learning
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
Huizhen Yu
OffRL
99
32
0
27 Dec 2017
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash
  Equilibrium
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
M. Heusel
Hubert Ramsauer
Thomas Unterthiner
Bernhard Nessler
Sepp Hochreiter
95
465
0
26 Jun 2017
On Generalized Bellman Equations and Temporal-Difference Learning
On Generalized Bellman Equations and Temporal-Difference Learning
Huizhen Yu
A. R. Mahmood
R. Sutton
118
29
0
14 Apr 2017
Multi-step Off-policy Learning Without Importance Sampling Ratios
Multi-step Off-policy Learning Without Importance Sampling Ratios
A. R. Mahmood
Huizhen Yu
R. Sutton
OffRL
143
54
0
09 Feb 2017
1