Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

31 March 2021

Zeke Xie

Papers citing "Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization"

20 / 20 papers shown

Title
Do we really have to filter out random noise in pre-training data for language models? Jinghan Ru Yuxin Xie Xianwei Zhuang Yuguo Yin Zhihui Guo Zhiming Liu Qianli Ren Yuexian Zou 83 2 0 10 Feb 2025
Computational Analysis of Yaredawi YeZema Silt in Ethiopian Orthodox Tewahedo Church Chants Mequanent Argaw Muluneh Yan-Tsung Peng Li Su 40 0 0 25 Dec 2024
Transfer Learning with Active Sampling for Rapid Training and Calibration in BCI-P300 Across Health States and Multi-centre Data Christian Flores Marcelo Contreras Ichiro Macedo Javier Andreu-Perez OOD 29 0 0 14 Dec 2024
Neural Field Classifiers via Target Encoding and Classification Loss Xindi Yang Zeke Xie Xiong Zhou Boyu Liu Buhua Liu Yi Liu Haoran Wang Yunfeng Cai Mingming Sun 36 0 0 02 Mar 2024
The Marginal Value of Momentum for Small Learning Rate SGD Runzhe Wang Sadhika Malladi Tianhao Wang Kaifeng Lyu Zhiyuan Li ODL 42 8 0 27 Jul 2023
Enhance Diffusion to Improve Robust Generalization Jianhui Sun Sanchit Sinha Aidong Zhang 24 4 0 05 Jun 2023
On the Overlooked Structure of Stochastic Gradients Zeke Xie Qian-Yuan Tang Mingming Sun P. Li 23 6 0 05 Dec 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD Zachary Novack Simran Kaur Tanya Marwah Saurabh Garg Zachary Chase Lipton FedML 27 2 0 29 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States Ziqiao Wang Yongyi Mao 13 10 0 19 Nov 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting Zhengqi He Zeke Xie Quanzhi Zhu Zengchang Qin 67 27 0 17 Jun 2022
Investigating Neural Architectures by Synthetic Dataset Design Adrien Courtois Jean-Michel Morel Pablo Arias 17 4 0 23 Apr 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training Juntang Zhuang Boqing Gong Liangzhe Yuan Yin Cui Hartwig Adam Nicha Dvornek S. Tatikonda James Duncan Ting Liu 14 146 0 15 Mar 2022
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate Aixiang Chen Chen Jinting Zhang Zanbo Zhang Zhihong Li 30 0 0 21 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning Zeke Xie Qian-Yuan Tang Yunfeng Cai Mingming Sun P. Li ODL 42 8 0 31 Jan 2022
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications Ziqiao Wang Yongyi Mao FedML MLT 32 22 0 07 Oct 2021
Ranger21: a synergistic deep learning optimizer Less Wright Nestor Demeure ODL AI4CE 14 85 0 25 Jun 2021
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective Zeke Xie Zhiqiang Xu Jingzhao Zhang Issei Sato Masashi Sugiyama 9 20 0 23 Nov 2020
Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting Zeke Xie Fengxiang He Shaopeng Fu Issei Sato Dacheng Tao Masashi Sugiyama 15 59 0 12 Nov 2020
Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum Zeke Xie Xinrui Wang Huishuai Zhang Issei Sato Masashi Sugiyama ODL 19 45 0 29 Jun 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 275 2,888 0 15 Sep 2016