Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

6 October 2020

Papers citing "Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate"

17 / 17 papers shown

Title
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries Chris Kolb T. Weber Bernd Bischl David Rügamer 113 0 0 04 Feb 2025
Normalization and effective learning rates in reinforcement learning Clare Lyle Zeyu Zheng Khimya Khetarpal James Martens H. V. Hasselt Razvan Pascanu Will Dabney 19 7 0 01 Jul 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 47 13 0 05 Apr 2024
Directional Smoothness and Gradient Methods: Convergence and Adaptivity Aaron Mishkin Ahmed Khaled Yuanhao Wang Aaron Defazio Robert Mansel Gower 44 6 0 06 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models Tero Karras M. Aittala J. Lehtinen Janne Hellsten Timo Aila S. Laine 33 155 0 05 Dec 2023
An SDE for Modeling SAM: Theory and Insights Enea Monzio Compagnoni Luca Biggio Antonio Orvieto F. Proske Hans Kersting Aurelien Lucchi 23 13 0 19 Jan 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis Taiki Miyagawa 50 9 0 28 Oct 2022
SGD with Large Step Sizes Learns Sparse Features Maksym Andriushchenko Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 45 56 0 11 Oct 2022
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning Javier Antorán David Janz J. Allingham Erik A. Daxberger Riccardo Barbano Eric T. Nalisnick José Miguel Hernández-Lobato UQCV BDL 27 28 0 17 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 37 69 0 14 Jun 2022
Robust Training of Neural Networks Using Scale Invariant Architectures Zhiyuan Li Srinadh Bhojanapalli Manzil Zaheer Sashank J. Reddi Surinder Kumar 19 27 0 02 Feb 2022
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 31 15 0 19 Jul 2021
How to decay your learning rate Aitor Lewkowycz 36 24 0 23 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 38 78 0 24 Feb 2021
On the training dynamics of deep networks with $L_2$ regularization Aitor Lewkowycz Guy Gur-Ari 36 53 0 15 Jun 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 284 2,890 0 15 Sep 2016