How to Set in Adam: An Online Learning Perspective
- OffRL
While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, and , remains largely incomplete.Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL), one of the most important class of algorithms in online learning.The prior analyses in these works required setting , which does not cover the more practical cases with .We derive novel, more general analyses that hold for both and .In both cases, our results strictly generalize the existing bounds.Furthermore, we show that our bounds are tight in the worst case.We also prove that setting is optimal for an oblivious adversary, but sub-optimal for an non-oblivious adversary.
View on arXiv