A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity

1 June 2025

Main:11 Pages

Bibliography:3 Pages

Appendix:10 Pages

Abstract

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in reinforcement learning. While prior work has established convergence guarantees, these results typically rely on the assumption that each iterate is projected onto a bounded set or that the learning rate is set according to the unknown strong convexity constant -- conditions that are both artificial and do not match the current practice.In this paper, we challenge the necessity of such assumptions and present a refined analysis of TD learning. We show that the simple projection-free variant converges with a rate of $\tilde{\mathcal{O}}(\frac{||\theta^*||^2_2}{\sqrt{T}})$ , even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.

View on arXiv

@article{lee2025_2506.01052,
  title={ A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity },
  author={ Wei-Cheng Lee and Francesco Orabona },
  journal={arXiv preprint arXiv:2506.01052},
  year={ 2025 }
}

Comments on this paper