ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.14657
54
16

Temporal Difference Learning as Gradient Splitting

27 October 2020
Rui Liu
Alexander Olshevsky
ArXiv (abs)PDFHTML
Abstract

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in terms of a splitting of the gradient of an appropriately chosen function. As a consequence of this interpretation, convergence proofs for gradient descent can be applied almost verbatim to temporal difference learning. Beyond giving a new, fuller explanation of why temporal difference works, our interpretation also yields improved convergence times. We consider the setting with 1/T1/\sqrt{T}1/T​ step-size, where previous comparable finite-time convergence time bounds for temporal difference learning had the multiplicative factor 1/(1−γ)1/(1-\gamma)1/(1−γ) in front of the bound, with γ\gammaγ being the discount factor. We show that a minor variation on TD learning which estimates the mean of the value function separately has a convergence time where 1/(1−γ)1/(1-\gamma)1/(1−γ) only multiplies an asymptotically negligible term.

View on arXiv
Comments on this paper