ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.09875
33
2

Backstepping Temporal Difference Learning

20 February 2023
Han-Dong Lim
Dong-hwan Lee
    OffRL
ArXivPDFHTML
Abstract

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD-learning algorithms, including gradient-TD learning (GTD), and TD-learning with correction (TDC), have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective, and propose a new convergent algorithm. Our method relies on the backstepping technique, which is widely used in nonlinear control theory. Finally, convergence of the proposed algorithm is experimentally verified in environments where the standard TD-learning is known to be unstable.

View on arXiv
@article{lim2025_2302.09875,
  title={ Backstepping Temporal Difference Learning },
  author={ Han-Dong Lim and Donghwan Lee },
  journal={arXiv preprint arXiv:2302.09875},
  year={ 2025 }
}
Comments on this paper