Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

8 March 2024

Papers citing "Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks"

5 / 5 papers shown

Title
Optimization Insights into Deep Diagonal Linear Networks Hippolyte Labarrière C. Molinari Lorenzo Rosasco S. Villa Cristian Vega 66 0 0 21 Dec 2024
The AdEMAMix Optimizer: Better, Faster, Older Matteo Pagliardini Pierre Ablin David Grangier ODL 28 8 0 05 Sep 2024
Implicit Bias of Mirror Flow on Separable Data Scott Pesme Radu-Alexandru Dragomir Nicolas Flammarion 27 1 0 18 Jun 2024
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults Prin Phunyaphibarn Junghyun Lee Bohan Wang Huishuai Zhang Chulhee Yun 10 0 0 25 Nov 2023
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights Weijie Su Stephen P. Boyd Emmanuel J. Candes 97 1,150 0 04 Mar 2015