TD(0) Learning converges for Polynomial mixing and non-linear functions
Theoretical work on Temporal Difference (TD) learning has provided finite-sample and high-probability guarantees for data generated from Markov chains. However, these bounds typically require linear function approximation, instance-dependent step sizes, algorithmic modifications, and restrictive mixing rates. We present theoretical findings for TD learning under more applicable assumptions, including instance-independent step sizes, full data utilization, and polynomial ergodicity, applicable to both linear and non-linear functions. \textbf{To our knowledge, this is the first proof of TD(0) convergence on Markov data under universal and instance-independent step sizes.} While each contribution is significant on its own, their combination allows these bounds to be effectively utilized in practical application settings. Our results include bounds for linear models and non-linear under generalized gradients and Hölder continuity.
View on arXiv@article{sridhar2025_2502.05706, title={ TD(0) Learning converges for Polynomial mixing and non-linear functions }, author={ Anupama Sridhar and Alexander Johansen }, journal={arXiv preprint arXiv:2502.05706}, year={ 2025 } }