The Break-Even Point on Optimization Trajectories of Deep Neural Networks

21 February 2020

Stanislaw Jastrzebski

Papers citing "The Break-Even Point on Optimization Trajectories of Deep Neural Networks"

34 / 34 papers shown

Title
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 49 4 0 17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 34 1 0 15 Jan 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Zhanpeng Zhou Mingze Wang Yuchen Mao Bingrui Li Junchi Yan AAML 62 0 0 14 Oct 2024
Can Optimization Trajectories Explain Multi-Task Transfer? David Mueller Mark Dredze Nicholas Andrews 55 1 0 26 Aug 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 61 4 1 25 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data Sakshi Choudhary Sai Aparna Aketi Kaushik Roy FedML 37 0 0 22 May 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability Atish Agarwala Jeffrey Pennington 41 3 0 30 Apr 2024
Investigation into the Training Dynamics of Learned Optimizers Jan Sobotka Petr Simánek Daniel Vasata 26 0 0 12 Dec 2023
A Coefficient Makes SVRG Effective Yida Yin Zhiqiu Xu Zhiyuan Li Trevor Darrell Zhuang Liu 25 1 0 09 Nov 2023
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression Xuxing Chen Krishnakumar Balasubramanian Promit Ghosal Bhavya Agrawalla 28 7 0 02 Oct 2023
Sharpness-Aware Minimization and the Edge of Stability Philip M. Long Peter L. Bartlett AAML 25 9 0 21 Sep 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta S. Keerthi Ayan Acharya Borja Ocejo Gregory Dexter Rajiv Khanna D. Durfee Rahul Mazumder AAML 13 7 0 19 Feb 2023
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon Atish Agarwala Yann N. Dauphin 19 20 0 17 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning Mihaela Rosca Yan Wu Chongli Qin Benoit Dherin 16 6 0 03 Feb 2023
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 33 36 0 14 Dec 2022
Leveraging Unlabeled Data to Track Memorization Mahsa Forouzesh Hanie Sedghi Patrick Thiran NoLa TDI 30 3 0 08 Dec 2022
A survey of deep learning optimizers -- first and second order methods Rohan Kashyap ODL 29 6 0 28 Nov 2022
Easy Begun is Half Done: Spatial-Temporal Graph Modeling with ST-Curriculum Dropout Hongjun Wang Jiyuan Chen Tongbo Pan Z. Fan Boyuan Zhang Renhe Jiang Lingyu Zhang Yi Xie Zhongyin Wang Xuan Song GNN 19 8 0 28 Nov 2022
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach Peng Mi Li Shen Tianhe Ren Yiyi Zhou Xiaoshuai Sun Rongrong Ji Dacheng Tao AAML 27 69 0 11 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example Xingyu Zhu Zixuan Wang Xiang Wang Mo Zhou Rong Ge 64 35 0 07 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging Jean Kaddour MoMe 3DH 19 39 0 29 Sep 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability Z. Li Zixuan Wang Jian Li 19 42 0 26 Jul 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 37 69 0 14 Jun 2022
Linear Connectivity Reveals Generalization Strategies Jeevesh Juneja Rachit Bansal Kyunghyun Cho João Sedoc Naomi Saphra 232 45 0 24 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba Murat A. Erdogdu Taiji Suzuki Zhichao Wang Denny Wu Greg Yang MLT 31 121 0 03 May 2022
How Do Vision Transformers Work? Namuk Park Songkuk Kim ViT 30 465 0 14 Feb 2022
Exponential escape efficiency of SGD from sharp minima in non-stationary regime Hikaru Ibayashi Masaaki Imaizumi 26 4 0 07 Nov 2021
Logit Attenuating Weight Normalization Aman Gupta R. Ramanath Jun Shi Anika Ramachandran Sirou Zhou Mingzhou Zhou S. Keerthi 30 1 0 12 Aug 2021
What can linear interpolation of neural network loss landscapes tell us? Tiffany J. Vlaar Jonathan Frankle MoMe 22 27 0 30 Jun 2021
Consensus Control for Decentralized Deep Learning Lingjing Kong Tao R. Lin Anastasia Koloskova Martin Jaggi Sebastian U. Stich 19 75 0 09 Feb 2021
A Random Matrix Theory Approach to Damping in Deep Learning Diego Granziol Nicholas P. Baskerville AI4CE ODL 24 2 0 15 Nov 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization Pierre Foret Ariel Kleiner H. Mobahi Behnam Neyshabur AAML 25 1,276 0 03 Oct 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 234 0 04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 281 2,888 0 15 Sep 2016