Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.04623
Cited By
Three Factors Influencing Minima in SGD
13 November 2017
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Three Factors Influencing Minima in SGD"
50 / 106 papers shown
Title
Class Imbalance in Anomaly Detection: Learning from an Exactly Solvable Model
F.S. Pezzicoli
V. Ros
F.P. Landes
M. Baity-Jesi
42
1
0
20 Jan 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
62
0
0
14 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
Cengiz Pehlevan
50
2
0
06 Oct 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
61
1
0
26 Aug 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
32
1
0
17 Jun 2024
Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen
Quyen Tran
Tuan Truong
Thanh-Toan Do
Dinh Q. Phung
Trung Le
46
0
0
11 Jun 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
41
3
0
30 Apr 2024
A PAC-Bayesian Link Between Generalisation and Flat Minima
Maxime Haddouche
Paul Viallard
Umut Simsekli
Benjamin Guedj
45
3
0
13 Feb 2024
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
38
26
0
20 Jul 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
21
3
0
08 Jun 2023
Machine learning in and out of equilibrium
Shishir Adhikari
Alkan Kabakcciouglu
A. Strang
Deniz Yuret
M. Hinczewski
24
4
0
06 Jun 2023
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
30
4
0
15 May 2023
Learning Trajectories are Generalization Indicators
Jingwen Fu
Zhizheng Zhang
Dacheng Yin
Yan Lu
Nanning Zheng
AI4CE
36
3
0
25 Apr 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
18
7
0
19 Feb 2023
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
19
55
0
14 Feb 2023
Generalization Bounds with Data-dependent Fractal Dimensions
Benjamin Dupuis
George Deligiannidis
Umut cSimcsekli
AI4CE
39
12
0
06 Feb 2023
Tighter Information-Theoretic Generalization Bounds from Supersamples
Ziqiao Wang
Yongyi Mao
32
17
0
05 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
M. Wyart
40
6
0
31 Jan 2023
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
25
13
0
19 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
39
12
0
16 Jan 2023
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
31
6
0
05 Dec 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
27
2
0
29 Nov 2022
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
37
6
0
28 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
30
10
0
19 Nov 2022
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
23
47
0
10 Nov 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
40
49
0
25 Oct 2022
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
Jonas Geiping
Micah Goldblum
Gowthami Somepalli
Ravid Shwartz-Ziv
Tom Goldstein
A. Wilson
26
35
0
12 Oct 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
34
72
0
26 Aug 2022
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
19
11
0
16 Aug 2022
Scaling ResNets in the Large-depth Regime
Pierre Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
26
16
0
14 Jun 2022
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion
Chengli Tan
Jiang Zhang
Junmin Liu
43
1
0
09 Jun 2022
Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile
Dong Chen
Lingfei Wu
Siliang Tang
Xiao Yun
Bo Long
Yueting Zhuang
VLM
NoLa
31
9
0
04 Jun 2022
Attack-Agnostic Adversarial Detection
Jiaxin Cheng
Mohamed Hussein
J. Billa
Wael AbdAlmageed
AAML
26
0
0
01 Jun 2022
The Effect of Task Ordering in Continual Learning
Samuel J. Bell
Neil D. Lawrence
CLL
48
17
0
26 May 2022
How catastrophic can catastrophic forgetting be in linear regression?
Itay Evron
E. Moroshko
Rachel A. Ward
Nati Srebro
Daniel Soudry
CLL
30
48
0
19 May 2022
Impact of Learning Rate on Noise Resistant Property of Deep Learning Models
Omobayode Fagbohungbe
Lijun Qian
32
3
0
08 May 2022
An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate
Sayar Karmakar
Anirbit Mukherjee
21
0
0
26 Apr 2022
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
36
57
0
03 Apr 2022
Tackling benign nonconvexity with smoothing and stochastic gradients
Harsh Vardhan
Sebastian U. Stich
28
8
0
18 Feb 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
21
7
0
09 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
9
0
31 Jan 2022
Class-Incremental Continual Learning into the eXtended DER-verse
Matteo Boschini
Lorenzo Bonicelli
Pietro Buzzega
Angelo Porrello
Simone Calderara
CLL
BDL
32
128
0
03 Jan 2022
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
Xiaowu Dai
Yuhua Zhu
27
4
0
02 Dec 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
34
4
0
07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
30
14
0
01 Nov 2021
Does Momentum Help? A Sample Complexity Analysis
Swetha Ganesh
Rohan Deb
Gugan Thoppe
A. Budhiraja
18
2
0
29 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
35
7
0
11 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
36
35
0
08 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
57
40
0
07 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
37
22
0
07 Oct 2021
1
2
3
Next