189

On the Performance of Temporal Difference Learning With Neural Networks

International Conference on Learning Representations (ICLR), 2023
Abstract

Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis of Neural TD Learning with a projection onto B(θ0,ω)B(\theta_0, \omega), a ball of fixed radius ω\omega around the initial point θ0\theta_0. We show an approximation bound of O(ϵ)+O~(1/m)O(\epsilon) + \tilde{O} (1/\sqrt{m}) where ϵ\epsilon is the approximation quality of the best neural network in B(θ0,ω)B(\theta_0, \omega) and mm is the width of all hidden layers in the network.

View on arXiv
Comments on this paper