Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

7 October 2021

Papers citing "Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect"

14 / 14 papers shown

Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 20 0 0 05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 47 4 0 17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 34 1 0 15 Jan 2025
How to escape sharp minima with random perturbations Kwangjun Ahn Ali Jadbabaie S. Sra ODL 22 6 0 25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond Itai Kreisler Mor Shpigel Nacson Daniel Soudry Y. Carmon 21 13 0 22 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability Jingfeng Wu Vladimir Braverman Jason D. Lee 24 16 0 19 May 2023
Convergence of Alternating Gradient Descent for Matrix Factorization R. Ward T. Kolda 22 6 0 11 May 2023
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 31 36 0 14 Dec 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example Xingyu Zhu Zixuan Wang Xiang Wang Mo Zhou Rong Ge 62 35 0 07 Oct 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 22 72 0 26 Aug 2022
Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization Tian-Chun Ye S. Du 6 46 0 27 Jun 2021
A Comparison of Optimization Algorithms for Deep Learning Derya Soydaner 55 149 0 28 Jul 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 150 232 0 04 Mar 2020
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay L. Smith 191 1,007 0 26 Mar 2018