Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.06175
Cited By
An Alternative View: When Does SGD Escape Local Minima?
17 February 2018
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Alternative View: When Does SGD Escape Local Minima?"
50 / 69 papers shown
Title
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Gabriel S. Gama
Valdir Grassi Jr
MoMe
52
0
0
15 May 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Dmitry Kovalev
62
1
0
16 Mar 2025
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
113
1
0
21 Jan 2025
Evolutionary algorithms as an alternative to backpropagation for supervised training of Biophysical Neural Networks and Neural ODEs
James Hazelden
Yuhan Helena Liu
Eli Shlizerman
E. Shea-Brown
52
2
0
17 Nov 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
32
41
0
12 Jul 2023
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
36
6
0
25 May 2023
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
28
5
0
06 Apr 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
29
9
0
05 Mar 2023
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
Maor Ivgi
Oliver Hinder
Y. Carmon
ODL
40
57
0
08 Feb 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Woojin Lee
Jaewook Lee
25
9
0
27 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
44
12
0
16 Jan 2023
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Satyen Kale
Jason D. Lee
Chris De Sa
Ayush Sekhari
Karthik Sridharan
36
4
0
13 Oct 2022
OCD: Learning to Overfit with Conditional Diffusion Models
Shahar Lutati
Lior Wolf
DiffM
35
8
0
02 Oct 2022
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks
Yizhou Liu
Weijie J. Su
Tongyang Li
41
18
0
29 Sep 2022
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
26
11
0
16 Aug 2022
A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics
Lei Li
Yuliang Wang
47
11
0
19 Jul 2022
On uniform-in-time diffusion approximation for stochastic gradient descent
Lei Li
Yuliang Wang
53
3
0
11 Jul 2022
Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Wei Fu
Chao Yu
Zelai Xu
Jiaqi Yang
Yi Wu
34
33
0
15 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
52
71
0
14 Jun 2022
Perseus: A Simple and Optimal High-Order Method for Variational Inequalities
Tianyi Lin
Michael I. Jordan
35
10
0
06 May 2022
Byzantine Fault Tolerance in Distributed Machine Learning : a Survey
Djamila Bouhata
Hamouma Moumen
Moumen Hamouma
Ahcène Bounceur
AI4CE
31
7
0
05 May 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
35
27
0
24 Apr 2022
Sharper Utility Bounds for Differentially Private Models
Yilin Kang
Yong Liu
Jian Li
Weiping Wang
FedML
35
3
0
22 Apr 2022
Federated Minimax Optimization: Improved Convergence Analyses and Algorithms
Pranay Sharma
Rohan Panda
Gauri Joshi
P. Varshney
FedML
21
47
0
09 Mar 2022
Boosting Mask R-CNN Performance for Long, Thin Forensic Traces with Pre-Segmentation and IoU Region Merging
Moritz Zink
M. Schiele
Pengcheng Fan
Stephan Gasterstädt
SSeg
17
0
0
08 Mar 2022
Tackling benign nonconvexity with smoothing and stochastic gradients
Harsh Vardhan
Sebastian U. Stich
31
8
0
18 Feb 2022
Exact Solutions of a Deep Linear Network
Liu Ziyin
Botao Li
Xiangmin Meng
ODL
24
21
0
10 Feb 2022
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
78
44
0
06 Feb 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
28
58
0
01 Feb 2022
On generalization bounds for deep networks based on loss surface implicit regularization
Masaaki Imaizumi
Johannes Schmidt-Hieber
ODL
42
3
0
12 Jan 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
Vitaly Kurin
Alessandro De Palma
Ilya Kostrikov
Shimon Whiteson
M. P. Kumar
41
74
0
11 Jan 2022
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
34
4
0
07 Nov 2021
Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent
Sharan Vaswani
Benjamin Dubois-Taine
Reza Babanezhad
56
11
0
21 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
133
98
0
16 Oct 2021
Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints
Shaojie Li
Yong Liu
26
13
0
19 Jul 2021
SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
Satyen Kale
Ayush Sekhari
Karthik Sridharan
196
29
0
11 Jul 2021
Stochastic Polyak Stepsize with a Moving Target
Robert Mansel Gower
Aaron Defazio
Michael G. Rabbat
32
17
0
22 Jun 2021
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities
Aleksandr Beznosikov
Pavel Dvurechensky
Anastasia Koloskova
V. Samokhin
Sebastian U. Stich
Alexander Gasnikov
37
43
0
15 Jun 2021
Problem-solving benefits of down-sampled lexicase selection
Thomas Helmuth
Lee Spector
33
29
0
10 Jun 2021
An Efficient Algorithm for Deep Stochastic Contextual Bandits
Tan Zhu
Guannan Liang
Chunjiang Zhu
HaiNing Li
J. Bi
42
1
0
12 Apr 2021
Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search
Kartik Hegde
Po-An Tsai
Sitao Huang
Vikas Chandra
A. Parashar
Christopher W. Fletcher
26
93
0
02 Mar 2021
Consistent Sparse Deep Learning: Theory and Computation
Y. Sun
Qifan Song
F. Liang
BDL
48
27
0
25 Feb 2021
Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency
Yuyang Deng
M. Mahdavi
35
59
0
25 Feb 2021
Stochastic Gradient Langevin Dynamics with Variance Reduction
Zhishen Huang
Stephen Becker
17
7
0
12 Feb 2021
A spin-glass model for the loss surfaces of generative adversarial networks
Nicholas P. Baskerville
J. Keating
F. Mezzadri
J. Najnudel
GAN
35
12
0
07 Jan 2021
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization
Jun-Kun Wang
Jacob D. Abernethy
24
7
0
04 Oct 2020
Solving Allen-Cahn and Cahn-Hilliard Equations using the Adaptive Physics Informed Neural Networks
Colby Wight
Jia Zhao
25
219
0
09 Jul 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert Mansel Gower
Othmane Sebbouh
Nicolas Loizou
30
74
0
18 Jun 2020
Geometry-Aware Gradient Algorithms for Neural Architecture Search
Liam Li
M. Khodak
Maria-Florina Balcan
Ameet Talwalkar
31
67
0
16 Apr 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
28
17
0
10 Feb 2020
1
2
Next