ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.17182
  4. Cited By
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
  Improve Generalization

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

31 March 2021
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
ArXivPDFHTML

Papers citing "Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization"

20 / 20 papers shown
Title
Do we really have to filter out random noise in pre-training data for language models?
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
83
2
0
10 Feb 2025
Computational Analysis of Yaredawi YeZema Silt in Ethiopian Orthodox
  Tewahedo Church Chants
Computational Analysis of Yaredawi YeZema Silt in Ethiopian Orthodox Tewahedo Church Chants
Mequanent Argaw Muluneh
Yan-Tsung Peng
Li Su
40
0
0
25 Dec 2024
Transfer Learning with Active Sampling for Rapid Training and
  Calibration in BCI-P300 Across Health States and Multi-centre Data
Transfer Learning with Active Sampling for Rapid Training and Calibration in BCI-P300 Across Health States and Multi-centre Data
Christian Flores
Marcelo Contreras
Ichiro Macedo
Javier Andreu-Perez
OOD
29
0
0
14 Dec 2024
Neural Field Classifiers via Target Encoding and Classification Loss
Neural Field Classifiers via Target Encoding and Classification Loss
Xindi Yang
Zeke Xie
Xiong Zhou
Boyu Liu
Buhua Liu
Yi Liu
Haoran Wang
Yunfeng Cai
Mingming Sun
36
0
0
02 Mar 2024
The Marginal Value of Momentum for Small Learning Rate SGD
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
42
8
0
27 Jul 2023
Enhance Diffusion to Improve Robust Generalization
Enhance Diffusion to Improve Robust Generalization
Jianhui Sun
Sanchit Sinha
Aidong Zhang
24
4
0
05 Jun 2023
On the Overlooked Structure of Stochastic Gradients
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
23
6
0
05 Dec 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
27
2
0
29 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of
  SGD via Training Trajectories and via Terminal States
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
13
10
0
19 Nov 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Zhengqi He
Zeke Xie
Quanzhi Zhu
Zengchang Qin
67
27
0
17 Jun 2022
Investigating Neural Architectures by Synthetic Dataset Design
Investigating Neural Architectures by Synthetic Dataset Design
Adrien Courtois
Jean-Michel Morel
Pablo Arias
17
4
0
23 Apr 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training
Surrogate Gap Minimization Improves Sharpness-Aware Training
Juntang Zhuang
Boqing Gong
Liangzhe Yuan
Yin Cui
Hartwig Adam
Nicha Dvornek
S. Tatikonda
James Duncan
Ting Liu
14
146
0
15 Mar 2022
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an
  Exponential Convergence Rate
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate
Aixiang Chen
Chen
Jinting Zhang
Zanbo Zhang
Zhihong Li
30
0
0
21 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
8
0
31 Jan 2022
On the Generalization of Models Trained with SGD: Information-Theoretic
  Bounds and Implications
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
32
22
0
07 Oct 2021
Ranger21: a synergistic deep learning optimizer
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODL
AI4CE
14
85
0
25 Jun 2021
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A
  Gradient-Norm Perspective
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective
Zeke Xie
Zhiqiang Xu
Jingzhao Zhang
Issei Sato
Masashi Sugiyama
9
20
0
23 Nov 2020
Artificial Neural Variability for Deep Learning: On Overfitting, Noise
  Memorization, and Catastrophic Forgetting
Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting
Zeke Xie
Fengxiang He
Shaopeng Fu
Issei Sato
Dacheng Tao
Masashi Sugiyama
15
59
0
12 Nov 2020
Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate
  and Momentum
Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum
Zeke Xie
Xinrui Wang
Huishuai Zhang
Issei Sato
Masashi Sugiyama
ODL
19
45
0
29 Jun 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
275
2,888
0
15 Sep 2016
1