A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in reinforcement learning. While prior work has established convergence guarantees, these results typically rely on the assumption that each iterate is projected onto a bounded set or that the learning rate is set according to the unknown strong convexity constant -- conditions that are both artificial and do not match the current practice.In this paper, we challenge the necessity of such assumptions and present a refined analysis of TD learning. We show that the simple projection-free variant converges with a rate of , even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.
View on arXiv@article{lee2025_2506.01052, title={ A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity }, author={ Wei-Cheng Lee and Francesco Orabona }, journal={arXiv preprint arXiv:2506.01052}, year={ 2025 } }