Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.09557
Cited By
Power-law escape rate of SGD
20 May 2021
Takashi Mori
Liu Ziyin
Kangqiao Liu
Masakuni Ueda
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Power-law escape rate of SGD"
7 / 7 papers shown
Title
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
94
2
0
21 Feb 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
62
0
0
14 Oct 2024
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
27
9
0
05 Mar 2023
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Marten Bjorkman
25
7
0
28 Jan 2023
Exact Solutions of a Deep Linear Network
Liu Ziyin
Botao Li
Xiangmin Meng
ODL
19
21
0
10 Feb 2022
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry
Fabrizio Pittorino
Antonio Ferraro
Gabriele Perugini
Christoph Feinauer
Carlo Baldassi
R. Zecchina
204
24
0
07 Feb 2022
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
284
2,889
0
15 Sep 2016
1