TD(0) Learning converges for Polynomial mixing and non-linear functions

8 February 2025

Abstract

Theoretical work on Temporal Difference (TD) learning has provided finite-sample and high-probability guarantees for data generated from Markov chains. However, these bounds typically require linear function approximation, instance-dependent step sizes, algorithmic modifications, and restrictive mixing rates. We present theoretical findings for TD learning under more applicable assumptions, including instance-independent step sizes, full data utilization, and polynomial ergodicity, applicable to both linear and non-linear functions. \textbf{To our knowledge, this is the first proof of TD(0) convergence on Markov data under universal and instance-independent step sizes.} While each contribution is significant on its own, their combination allows these bounds to be effectively utilized in practical application settings. Our results include bounds for linear models and non-linear under generalized gradients and Hölder continuity.

View on arXiv

@article{sridhar2025_2502.05706,
  title={ TD(0) Learning converges for Polynomial mixing and non-linear functions },
  author={ Anupama Sridhar and Alexander Johansen },
  journal={arXiv preprint arXiv:2502.05706},
  year={ 2025 }
}

Comments on this paper