Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.07628
Cited By
Improving Generalization Performance by Switching from Adam to SGD
20 December 2017
N. Keskar
R. Socher
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Generalization Performance by Switching from Adam to SGD"
37 / 37 papers shown
Title
High-entropy Advantage in Neural Networks' Generalizability
Entao Yang
X. Zhang
Yue Shang
Ge Zhang
AI4CE
60
0
0
17 Mar 2025
Adapter-Enhanced Semantic Prompting for Continual Learning
Baocai Yin
Ji Zhao
Huajie Jiang
Ningning Hou
Yongli Hu
Amin Beheshti
Ming-Hsuan Yang
Yuankai Qi
CLL
VLM
97
0
0
15 Dec 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
33
3
0
18 Oct 2024
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
Junhao Su
Chenghao He
Feiyu Zhu
Xiaojie Xu
Dongzhi Guan
Chenyang Si
50
2
0
08 Jul 2024
Variational Stochastic Gradient Descent for Deep Neural Networks
Haotian Chen
Anna Kuzina
Babak Esmaeili
Jakub M. Tomczak
47
0
0
09 Apr 2024
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
27
4
0
02 Jul 2023
Mathematical Challenges in Deep Learning
V. Nia
Guojun Zhang
I. Kobyzev
Michael R. Metel
Xinlin Li
...
S. Hemati
M. Asgharian
Linglong Kong
Wulong Liu
Boxing Chen
AI4CE
VLM
37
1
0
24 Mar 2023
Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize
Mert Gurbuzbalaban
Yuanhan Hu
Umut Simsekli
Lingjiong Zhu
LRM
13
1
0
10 Feb 2023
A Closer Look at Smoothness in Domain Adversarial Training
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
Arihant Jain
R. Venkatesh Babu
27
119
0
16 Jun 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
15
20
0
28 May 2022
MolMiner: You only look once for chemical structure recognition
Youjun Xu
Jinchuan Xiao
Chia-Han Chou
Jianhang Zhang
Jintao Zhu
...
Zhen Zhang
Shuhao Zhang
Weilin Zhang
L. Lai
Jianfeng Pei
21
19
0
23 May 2022
An Adaptive Gradient Method with Energy and Momentum
Hailiang Liu
Xuping Tian
ODL
16
9
0
23 Mar 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
16
7
0
09 Feb 2022
Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization
Tao Sun
Huaming Ling
Zuoqiang Shi
Dongsheng Li
Bao Wang
ODL
19
13
0
18 Oct 2021
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
44
37
0
25 Aug 2021
Physics-constrained Deep Learning for Robust Inverse ECG Modeling
Jianxin Xie
B. Yao
27
21
0
26 Jul 2021
Coconut trees detection and segmentation in aerial imagery using mask region-based convolution neural network
M. Iqbal
Hazrat Ali
Son N. Tran
Talha Iqbal
15
41
0
10 May 2021
The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks
Bohan Wang
Qi Meng
Wei Chen
Tie-Yan Liu
20
33
0
11 Dec 2020
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
24
2
0
15 Nov 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
51
1,276
0
03 Oct 2020
When Does Preconditioning Help or Hurt Generalization?
S. Amari
Jimmy Ba
Roger C. Grosse
Xuechen Li
Atsushi Nitanda
Taiji Suzuki
Denny Wu
Ji Xu
34
32
0
18 Jun 2020
To Each Optimizer a Norm, To Each Norm its Generalization
Sharan Vaswani
Reza Babanezhad
Jose Gallego
Aaron Mishkin
Simon Lacoste-Julien
Nicolas Le Roux
24
8
0
11 Jun 2020
Flexible numerical optimization with ensmallen
Ryan R. Curtin
Marcus Edel
Rahul Prabhu
S. Basak
Zhihao Lou
Conrad Sanderson
14
1
0
09 Mar 2020
Iterative Averaging in the Quest for Best Test Error
Diego Granziol
Xingchen Wan
Samuel Albanie
Stephen J. Roberts
8
3
0
02 Mar 2020
Information-Theoretic Local Minima Characterization and Regularization
Zhiwei Jia
Hao Su
17
19
0
19 Nov 2019
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
11
46
0
27 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
51
70
0
09 Oct 2019
The Channel Attention based Context Encoder Network for Inner Limiting Membrane Detection
H. Qiu
Zaiwang Gu
Lei Mou
Xiaoqian Mao
Liyang Fang
Yitian Zhao
Jiang-Dong Liu
Jun Cheng
20
0
0
09 Aug 2019
DEAM: Adaptive Momentum with Discriminative Weight for Stochastic Optimization
Jiyang Bai
Yuxiang Ren
Jiawei Zhang
ODL
18
1
0
25 Jul 2019
CE-Net: Context Encoder Network for 2D Medical Image Segmentation
Zaiwang Gu
Jun Cheng
H. Fu
Kang Zhou
Huaying Hao
Yitian Zhao
Tianyang Zhang
Shenghua Gao
Jiang-Dong Liu
SSeg
15
1,614
0
07 Mar 2019
Optimal Adaptive and Accelerated Stochastic Gradient Descent
Qi Deng
Yi Cheng
Guanghui Lan
8
8
0
01 Oct 2018
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Zhiming Zhou
Qingru Zhang
Guansong Lu
Hongwei Wang
Weinan Zhang
Yong Yu
16
66
0
29 Sep 2018
On the Generalization of Stochastic Gradient Descent with Momentum
Ali Ramezani-Kebrya
Kimon Antonakopoulos
V. Cevher
Ashish Khisti
Ben Liang
MLT
12
23
0
12 Sep 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
19
192
0
18 Jun 2018
SdcNet: A Computation-Efficient CNN for Object Recognition
Yunlong Ma
Chunyan Wang
21
3
0
03 May 2018
Remote Detection of Idling Cars Using Infrared Imaging and Deep Networks
M. Bastan
Kim-Hui Yap
Lap-Pui Chau
32
6
0
28 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1