Recurrent neural network training with preconditioned stochastic gradient descent

14 June 2016

Abstract

Recurrent neural networks (RNN), especially the ones requiring extremely long term memories, are difficult to training. Hence, they provide an ideal testbed for benchmarking the performance of optimization algorithms. This paper reports test results of a recently proposed preconditioned stochastic gradient descent (PSGD) algorithm on RNN training. We find that PSGD may outperform Hessian-free optimization which achieves the state-of-the-art performance on the target problems, although it is only slightly more complicated than stochastic gradient descent (SGD) and is user friendly, virtually a tuning free algorithm.

View on arXiv

Comments on this paper