Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

27 May 2019

Boris Ginsburg

Papers citing "Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks"

3 / 3 papers shown

Title
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape Devansh Bisla Jing Wang A. Choromańska 25 34 0 20 Jan 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks Tong He Zhi-Li Zhang Hang Zhang Zhongyue Zhang Junyuan Xie Mu Li 221 1,399 0 04 Dec 2018