Stochasticity of Deterministic Gradient Descent: Large Learning Rate for
Multiscale Objective Function

v1v2 (latest)

Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

14 February 2020

Lingkai Kong

ArXiv (abs)PDF HTML

Papers citing "Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function"

8 / 8 papers shown

Title
Leveraging chaos in the training of artificial neural networks Pedro Jiménez-González Miguel C. Soriano Lucas Lacasa 25 0 0 10 Jun 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 83 2 0 05 Apr 2025
$The boundary of neural network trainability is fractal$ The boundary of neural network trainability is fractal Jascha Narain Sohl-Dickstein 80 9 0 09 Feb 2024
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 121 75 0 14 Jun 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes Chao Ma D. Kunin Lei Wu Lexing Ying 93 30 0 24 Apr 2022
Gradients are Not All You Need Luke Metz C. Freeman S. Schoenholz Tal Kachman 98 93 0 10 Nov 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 138 42 0 07 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 173 76 0 29 Sep 2021