329
v1v2 (latest)

How to Set β1,β2β_1, β_2 in Adam: An Online Learning Perspective

Main:14 Pages
1 Figures
Bibliography:1 Pages
Appendix:1 Pages
Abstract

While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, β1\beta_1 and β2\beta_2, remains largely incomplete.Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL), one of the most important class of algorithms in online learning.The prior analyses in these works required setting β1=β2\beta_1 = \sqrt{\beta_2}, which does not cover the more practical cases with β1β2\beta_1 \neq \sqrt{\beta_2}.We derive novel, more general analyses that hold for both β1β2\beta_1 \geq \sqrt{\beta_2} and β1β2\beta_1 \leq \sqrt{\beta_2}.In both cases, our results strictly generalize the existing bounds.Furthermore, we show that our bounds are tight in the worst case.We also prove that setting β1=β2\beta_1 = \sqrt{\beta_2} is optimal for an oblivious adversary, but sub-optimal for an non-oblivious adversary.

View on arXiv
Comments on this paper