ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.13985
  4. Cited By
Stochastic Normalized Gradient Descent with Momentum for Large-Batch
  Training

Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

28 July 2020
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
    ODL
ArXivPDFHTML

Papers citing "Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training"

8 / 8 papers shown
Title
Neural Optimizer Equation, Decay Function, and Learning Rate Schedule
  Joint Evolution
Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution
Brandon Morgan
Dean Frederick Hougen
ODL
18
0
0
10 Apr 2024
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
6
0
0
23 May 2023
Revisiting Outer Optimization in Adversarial Training
Revisiting Outer Optimization in Adversarial Training
Ali Dabouei
Fariborz Taherkhani
Sobhan Soleymani
Nasser M. Nasrabadi
AAML
17
4
0
02 Sep 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and
  Stronger
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Zhiqi Bu
Yu-Xiang Wang
Sheng Zha
George Karypis
11
67
0
14 Jun 2022
DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training
DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training
Kun Yuan
Yiming Chen
Xinmeng Huang
Yingya Zhang
Pan Pan
Yinghui Xu
W. Yin
MoE
48
60
0
24 Apr 2021
Descending through a Crowded Valley - Benchmarking Deep Learning
  Optimizers
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
32
161
0
03 Jul 2020
Global Momentum Compression for Sparse Communication in Distributed
  Learning
Global Momentum Compression for Sparse Communication in Distributed Learning
Chang-Wei Shi
Shen-Yi Zhao
Yin-Peng Xie
Hao Gao
Wu-Jun Li
17
1
0
30 May 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,888
0
15 Sep 2016
1