Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.16400
Cited By
Flatter, faster: scaling momentum for optimal speedup of SGD
28 October 2022
Aditya Cowsik
T. Can
Paolo Glorioso
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Flatter, faster: scaling momentum for optimal speedup of SGD"
4 / 4 papers shown
Title
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
83
98
0
13 Oct 2021
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
150
198
0
04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,696
0
15 Sep 2016
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
175
1,182
0
30 Nov 2014
1