ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient
- ODL
Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in deep learning. Convergence of SGD depends on how carefully learning rate of the algorithm is tuned and the noise in stochastic estimates of the gradient. In this paper we propose a new adaptive learning rate algorithm, that utilizes the curvature information for automatically tuning the learning rates of gradients of each parameter. The information about the local curvature of the loss-surface is estimated from the local statistics of the stochastic first order gradients. We propose a new variance reduction technique to tune the amount of noise in the gradient estimates to speed up the learning. In our preliminary experiments with deep neural networks, we got better performance compared to other stochastic gradient algorithms.
View on arXiv