ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.06763
  4. Cited By
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

18 June 2018
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
    ODL
ArXivPDFHTML

Papers citing "Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks"

26 / 26 papers shown
Title
Spectral-factorized Positive-definite Curvature Learning for NN Training
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
47
0
0
10 Feb 2025
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
31
1
0
12 Apr 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
38
10
0
06 Feb 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
19
2
0
17 Jan 2024
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
25
8
0
26 Jun 2023
On the Algorithmic Stability and Generalization of Adaptive Optimization
  Methods
On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Han Nguyen
Hai Pham
Sashank J. Reddi
Barnabás Póczos
ODL
AI4CE
15
2
0
08 Nov 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of
  Deep Learning Optimizer using Hyperparameters Close to One
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
25
4
0
21 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
18
62
0
20 Aug 2022
Training neural networks using Metropolis Monte Carlo and an adaptive
  variant
Training neural networks using Metropolis Monte Carlo and an adaptive variant
S. Whitelam
V. Selin
Ian Benlolo
Corneel Casert
Isaac Tamblyn
BDL
11
7
0
16 May 2022
Communication-Efficient Adaptive Federated Learning
Communication-Efficient Adaptive Federated Learning
Yujia Wang
Lu Lin
Jinghui Chen
FedML
21
69
0
05 May 2022
Optimal learning rate schedules in high-dimensional non-convex
  optimization problems
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
16
7
0
09 Feb 2022
A Novel Convergence Analysis for Algorithms of the Adam Family
A Novel Convergence Analysis for Algorithms of the Adam Family
Zhishuai Guo
Yi Tian Xu
W. Yin
R. L. Jin
Tianbao Yang
39
46
0
07 Dec 2021
Understanding the Generalization of Adam in Learning Neural Networks
  with Proper Regularization
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
44
37
0
25 Aug 2021
A New Adaptive Gradient Method with Gradient Decomposition
A New Adaptive Gradient Method with Gradient Decomposition
Zhou Shao
Tong Lin
ODL
13
0
0
18 Jul 2021
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
A. Davtyan
Sepehr Sameni
L. Cerkezi
Givi Meishvili
Adam Bielski
Paolo Favaro
ODL
51
2
0
07 Jul 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
  Improve Generalization
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
19
29
0
31 Mar 2021
A Random Matrix Theory Approach to Damping in Deep Learning
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
24
2
0
15 Nov 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign
  Dropout
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
35
206
0
14 Oct 2020
Effective Federated Adaptive Gradient Methods with Non-IID Decentralized
  Data
Effective Federated Adaptive Gradient Methods with Non-IID Decentralized Data
Qianqian Tong
Guannan Liang
J. Bi
FedML
33
27
0
14 Sep 2020
A new regret analysis for Adam-type algorithms
A new regret analysis for Adam-type algorithms
Ahmet Alacaoglu
Yura Malitsky
P. Mertikopoulos
V. Cevher
ODL
48
42
0
21 Mar 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and
  Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn
David Berthelot
Chun-Liang Li
Zizhao Zhang
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Han Zhang
Colin Raffel
AAML
36
3,464
0
21 Jan 2020
Learning Rate Dropout
Learning Rate Dropout
Huangxing Lin
Weihong Zeng
Xinghao Ding
Yue Huang
Yihong Zhuang
John Paisley
ODL
16
9
0
30 Nov 2019
Demon: Improved Neural Network Training with Momentum Decay
Demon: Improved Neural Network Training with Momentum Decay
John Chen
Cameron R. Wolfe
Zhaoqi Li
Anastasios Kyrillidis
ODL
8
15
0
11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
44
70
0
09 Oct 2019
Sequential Training of Neural Networks with Gradient Boosting
Sequential Training of Neural Networks with Gradient Boosting
S. Emami
Gonzalo Martýnez-Muñoz
ODL
11
18
0
26 Sep 2019
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
Rachel A. Ward
Xiaoxia Wu
Léon Bottou
ODL
19
358
0
05 Jun 2018
1