Three Factors Influencing Minima in SGD

13 November 2017

Stanislaw Jastrzebski

Amos Storkey

Papers citing "Three Factors Influencing Minima in SGD"

50 / 106 papers shown

Title
Class Imbalance in Anomaly Detection: Learning from an Exactly Solvable Model F.S. Pezzicoli V. Ros F.P. Landes M. Baity-Jesi 42 1 0 20 Jan 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Zhanpeng Zhou Mingze Wang Yuchen Mao Bingrui Li Junchi Yan AAML 62 0 0 14 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength Alexander B. Atanasov Alexandru Meterez James B. Simon Cengiz Pehlevan 50 2 0 06 Oct 2024
Can Optimization Trajectories Explain Multi-Task Transfer? David Mueller Mark Dredze Nicholas Andrews 61 1 0 26 Aug 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD Pierfrancesco Beneventano Andrea Pinto Tomaso A. Poggio MLT 32 1 0 17 Jun 2024
Agnostic Sharpness-Aware Minimization Van-Anh Nguyen Quyen Tran Tuan Truong Thanh-Toan Do Dinh Q. Phung Trung Le 46 0 0 11 Jun 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability Atish Agarwala Jeffrey Pennington 41 3 0 30 Apr 2024
A PAC-Bayesian Link Between Generalisation and Flat Minima Maxime Haddouche Paul Viallard Umut Simsekli Benjamin Guedj 45 3 0 13 Feb 2024
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization Kaiyue Wen Zhiyuan Li Tengyu Ma FAtt 38 26 0 20 Jul 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances Marcel Kühn B. Rosenow 21 3 0 08 Jun 2023
Machine learning in and out of equilibrium Shishir Adhikari Alkan Kabakcciouglu A. Strang Deniz Yuret M. Hinczewski 24 4 0 06 Jun 2023
GeNAS: Neural Architecture Search with Better Generalization Joonhyun Jeong Joonsang Yu Geondo Park Dongyoon Han Y. Yoo 30 4 0 15 May 2023
Learning Trajectories are Generalization Indicators Jingwen Fu Zhizheng Zhang Dacheng Yin Yan Lu Nanning Zheng AI4CE 36 3 0 25 Apr 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta S. Keerthi Ayan Acharya Borja Ocejo Gregory Dexter Rajiv Khanna D. Durfee Rahul Mazumder AAML 18 7 0 19 Feb 2023
A Modern Look at the Relationship between Sharpness and Generalization Maksym Andriushchenko Francesco Croce Maximilian Müller Matthias Hein Nicolas Flammarion 3DH 19 55 0 14 Feb 2023
Generalization Bounds with Data-dependent Fractal Dimensions Benjamin Dupuis George Deligiannidis Umut cSimcsekli AI4CE 39 12 0 06 Feb 2023
Tighter Information-Theoretic Generalization Bounds from Supersamples Ziqiao Wang Yongyi Mao 32 17 0 05 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning Antonio Sclocchi Mario Geiger M. Wyart 40 6 0 31 Jan 2023
An SDE for Modeling SAM: Theory and Insights Enea Monzio Compagnoni Luca Biggio Antonio Orvieto F. Proske Hans Kersting Aurelien Lucchi 25 13 0 19 Jan 2023
Stability Analysis of Sharpness-Aware Minimization Hoki Kim Jinseong Park Yujin Choi Jaewook Lee 39 12 0 16 Jan 2023
On the Overlooked Structure of Stochastic Gradients Zeke Xie Qian-Yuan Tang Mingming Sun P. Li 31 6 0 05 Dec 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD Zachary Novack Simran Kaur Tanya Marwah Saurabh Garg Zachary Chase Lipton FedML 27 2 0 29 Nov 2022
A survey of deep learning optimizers -- first and second order methods Rohan Kashyap ODL 37 6 0 28 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States Ziqiao Wang Yongyi Mao 30 10 0 19 Nov 2022
How Does Sharpness-Aware Minimization Minimize Sharpness? Kaiyue Wen Tengyu Ma Zhiyuan Li AAML 23 47 0 10 Nov 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models Hong Liu Sang Michael Xie Zhiyuan Li Tengyu Ma AI4CE 40 49 0 25 Oct 2022
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization Jonas Geiping Micah Goldblum Gowthami Somepalli Ravid Shwartz-Ziv Tom Goldstein A. Wilson 26 35 0 12 Oct 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 34 72 0 26 Aug 2022
On the generalization of learning algorithms that do not converge N. Chandramoorthy Andreas Loukas Khashayar Gatmiry Stefanie Jegelka MLT 19 11 0 16 Aug 2022
Scaling ResNets in the Large-depth Regime Pierre Marion Adeline Fermanian Gérard Biau Jean-Philippe Vert 26 16 0 14 Jun 2022
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion Chengli Tan Jiang Zhang Junmin Liu 43 1 0 09 Jun 2022
Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile Dong Chen Lingfei Wu Siliang Tang Xiao Yun Bo Long Yueting Zhuang VLM NoLa 31 9 0 04 Jun 2022
Attack-Agnostic Adversarial Detection Jiaxin Cheng Mohamed Hussein J. Billa Wael AbdAlmageed AAML 26 0 0 01 Jun 2022
The Effect of Task Ordering in Continual Learning Samuel J. Bell Neil D. Lawrence CLL 48 17 0 26 May 2022
How catastrophic can catastrophic forgetting be in linear regression? Itay Evron E. Moroshko Rachel A. Ward Nati Srebro Daniel Soudry CLL 30 48 0 19 May 2022
Impact of Learning Rate on Noise Resistant Property of Deep Learning Models Omobayode Fagbohungbe Lijun Qian 32 3 0 08 May 2022
An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate Sayar Karmakar Anirbit Mukherjee 21 0 0 26 Apr 2022
Understanding the unstable convergence of gradient descent Kwangjun Ahn J.N. Zhang S. Sra 36 57 0 03 Apr 2022
Tackling benign nonconvexity with smoothing and stochastic gradients Harsh Vardhan Sebastian U. Stich 28 8 0 18 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems Stéphane dÁscoli Maria Refinetti Giulio Biroli 21 7 0 09 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning Zeke Xie Qian-Yuan Tang Yunfeng Cai Mingming Sun P. Li ODL 42 9 0 31 Jan 2022
Class-Incremental Continual Learning into the eXtended DER-verse Matteo Boschini Lorenzo Bonicelli Pietro Buzzega Angelo Porrello Simone Calderara CLL BDL 32 128 0 03 Jan 2022
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective Xiaowu Dai Yuhua Zhu 27 4 0 02 Dec 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime Hikaru Ibayashi Masaaki Imaizumi 34 4 0 07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey Xiaoxin He Fuzhao Xue Xiaozhe Ren Yang You 30 14 0 01 Nov 2021
Does Momentum Help? A Sample Complexity Analysis Swetha Ganesh Rohan Deb Gugan Thoppe A. Budhiraja 18 2 0 29 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations Jiayao Zhang Hua Wang Weijie J. Su 35 7 0 11 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning Justin Gilmer Behrooz Ghorbani Ankush Garg Sneha Kudugunta Behnam Neyshabur David E. Cardoze George E. Dahl Zachary Nado Orhan Firat ODL 36 35 0 08 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 57 40 0 07 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications Ziqiao Wang Yongyi Mao FedML MLT 37 22 0 07 Oct 2021