On the Relation Between the Sharpest Directions of DNN Loss and the SGD
Step Length

On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

13 July 2018

Stanislaw Jastrzebski

Amos Storkey

Papers citing "On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length"

17 / 17 papers shown

Title
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 66 4 1 25 May 2024
Fairness Without Demographics in Human-Centered Federated Learning Shaily Roy Harshit Sharma Asif Salekin 48 2 0 30 Apr 2024
Accelerating Distributed ML Training via Selective Synchronization S. Tyagi Martin Swany FedML 26 3 0 16 Jul 2023
On a continuous time model of gradient descent dynamics and instability in deep learning Mihaela Rosca Yan Wu Chongli Qin Benoit Dherin 16 6 0 03 Feb 2023
Communication-Efficient Federated Learning for Heterogeneous Edge Devices Based on Adaptive Gradient Quantization Heting Liu Fang He Guohong Cao FedML MQ 21 24 0 16 Dec 2022
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 36 36 0 14 Dec 2022
Understanding the unstable convergence of gradient descent Kwangjun Ahn J. Zhang S. Sra 24 57 0 03 Apr 2022
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 31 15 0 19 Jul 2021
Consensus Control for Decentralized Deep Learning Lingjing Kong Tao R. Lin Anastasia Koloskova Martin Jaggi Sebastian U. Stich 19 75 0 09 Feb 2021
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization Adepu Ravi Sankar Yash Khasbage Rahul Vigneswaran V. Balasubramanian 25 41 0 07 Dec 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training Diego Granziol S. Zohren Stephen J. Roberts ODL 29 48 0 16 Jun 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 42 154 0 21 Feb 2020
Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin Colin Wei Tengyu Ma AAML OOD 36 85 0 09 Oct 2019
GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks Avraam Chatzimichailidis Franz-Josef Pfreundt N. Gauger J. Keuper 19 10 0 26 Sep 2019
Stiffness: A New Perspective on Generalization in Neural Networks Stanislav Fort Pawel Krzysztof Nowak Stanislaw Jastrzebski S. Narayanan 19 94 0 28 Jan 2019
Laplacian Smoothing Gradient Descent Stanley Osher Bao Wang Penghang Yin Xiyang Luo Farzin Barekat Minh Pham A. Lin ODL 19 43 0 17 Jun 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 281 2,889 0 15 Sep 2016