27
0

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Abstract

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general LL-layer neural network. New proof techniques are developed and an improved new O~(ϵ1)\tilde{\mathcal{O}}(\epsilon^{-1}) sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an O~(ϵ1)\tilde{\mathcal{O}}(\epsilon^{-1}) complexity under the Markovian sampling, as opposed to the best known O~(ϵ2)\tilde{\mathcal{O}}(\epsilon^{-2}) complexity in the existing literature.

View on arXiv
Comments on this paper