Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.17182
Cited By
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
31 March 2021
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization"
20 / 20 papers shown
Title
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
83
2
0
10 Feb 2025
Computational Analysis of Yaredawi YeZema Silt in Ethiopian Orthodox Tewahedo Church Chants
Mequanent Argaw Muluneh
Yan-Tsung Peng
Li Su
40
0
0
25 Dec 2024
Transfer Learning with Active Sampling for Rapid Training and Calibration in BCI-P300 Across Health States and Multi-centre Data
Christian Flores
Marcelo Contreras
Ichiro Macedo
Javier Andreu-Perez
OOD
29
0
0
14 Dec 2024
Neural Field Classifiers via Target Encoding and Classification Loss
Xindi Yang
Zeke Xie
Xiong Zhou
Boyu Liu
Buhua Liu
Yi Liu
Haoran Wang
Yunfeng Cai
Mingming Sun
36
0
0
02 Mar 2024
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
42
8
0
27 Jul 2023
Enhance Diffusion to Improve Robust Generalization
Jianhui Sun
Sanchit Sinha
Aidong Zhang
24
4
0
05 Jun 2023
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
23
6
0
05 Dec 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
27
2
0
29 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
13
10
0
19 Nov 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Zhengqi He
Zeke Xie
Quanzhi Zhu
Zengchang Qin
67
27
0
17 Jun 2022
Investigating Neural Architectures by Synthetic Dataset Design
Adrien Courtois
Jean-Michel Morel
Pablo Arias
17
4
0
23 Apr 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training
Juntang Zhuang
Boqing Gong
Liangzhe Yuan
Yin Cui
Hartwig Adam
Nicha Dvornek
S. Tatikonda
James Duncan
Ting Liu
14
146
0
15 Mar 2022
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate
Aixiang Chen
Chen
Jinting Zhang
Zanbo Zhang
Zhihong Li
30
0
0
21 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
8
0
31 Jan 2022
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
32
22
0
07 Oct 2021
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODL
AI4CE
14
85
0
25 Jun 2021
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective
Zeke Xie
Zhiqiang Xu
Jingzhao Zhang
Issei Sato
Masashi Sugiyama
9
20
0
23 Nov 2020
Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting
Zeke Xie
Fengxiang He
Shaopeng Fu
Issei Sato
Dacheng Tao
Masashi Sugiyama
15
59
0
12 Nov 2020
Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum
Zeke Xie
Xinrui Wang
Huishuai Zhang
Issei Sato
Masashi Sugiyama
ODL
19
45
0
29 Jun 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
275
2,888
0
15 Sep 2016
1