v1v2v3 (latest)

Online Normalization for Training Neural Networks

Neural Information Processing Systems (NeurIPS), 2019

15 May 2019

Papers citing "Online Normalization for Training Neural Networks"

35 / 35 papers shown

Title
Weight Decay may matter more than muP for Learning Rate Transfer in Practice Atli Kosson Jeremy Welborn Yang Liu Martin Jaggi Xi Chen 48 1 0 21 Oct 2025
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler Aleksandr Dremov Alexander Hägele Atli Kosson Martin Jaggi 124 1 0 02 Aug 2025
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training Shane Bergsma Nolan Dey Gurpreet Gosal Gavia Gray Daria Soboleva Joel Hestness 299 13 0 19 May 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer Soham Sane ODL 352 0 0 22 Apr 2025
Analyzing & Reducing the Need for Learning Rate Warmup in GPT TrainingNeural Information Processing Systems (NeurIPS), 2024 Atli Kosson Bettina Messmer Martin Jaggi AI4CE 204 14 0 31 Oct 2024
Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework Shaobo Wang Xiangdong Zhang Dongrui Liu Junchi Yan 263 1 0 27 Nov 2023
Maintaining Plasticity in Deep Continual Learning Shibhansh Dohare J. F. Hernandez-Garcia Parash Rahman A. Rupam Mahmood Richard S. Sutton KELM CLL 305 36 0 23 Jun 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural NetworksInternational Conference on Machine Learning (ICML), 2023 Atli Kosson Bettina Messmer Martin Jaggi 395 28 0 26 May 2023
Ghost Noise for Regularizing Deep Neural NetworksAAAI Conference on Artificial Intelligence (AAAI), 2023 Atli Kosson Dongyang Fan Martin Jaggi 230 2 0 26 May 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error AnalysisNeural Information Processing Systems (NeurIPS), 2022 Taiki Miyagawa 207 10 0 28 Oct 2022
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction Wenlong Deng Lang Lang Ziqiang Liu B. Liu 152 0 0 09 Oct 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three RegimesNeural Information Processing Systems (NeurIPS), 2022 M. Kodryan E. Lobacheva M. Nakhodnov Dmitry Vetrov 237 19 0 08 Sep 2022
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid NetworkConference on Machine Learning and Systems (MLSys), 2022 Vitaliy Chiley Vithursan Thangarasa Abhay Gupta Anshul Samar Joel Hestness D. DeCoste 137 12 0 28 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness ReductionNeural Information Processing Systems (NeurIPS), 2022 Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 231 86 0 14 Jun 2022
Delving into the Estimation Shift of Batch Normalization in a NetworkComputer Vision and Pattern Recognition (CVPR), 2022 Lei Huang Yi Zhou Tian Wang Jie Luo Xianglong Liu BDL 188 25 0 21 Mar 2022
One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 H. Taherian Sefik Emre Eskimez Takuya Yoshioka Huaming Wang Zhuo Chen Xuedong Huang 150 24 0 20 Oct 2021
Continual Backprop: Stochastic Gradient Descent with Persistent Randomness Shibhansh Dohare R. Sutton A. R. Mahmood CLL 258 90 0 13 Aug 2021
On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight DecayNeural Information Processing Systems (NeurIPS), 2021 E. Lobacheva M. Kodryan Nadezhda Chirkova A. Malinin Dmitry Vetrov 258 27 0 29 Jun 2021
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep LearningNeural Information Processing Systems (NeurIPS), 2021 Ekdeep Singh Lubana Robert P. Dick Hidenori Tanaka 259 44 0 10 Jun 2021
Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch DependenceNeural Information Processing Systems (NeurIPS), 2021 A. Labatie Dominic Masters Zach Eaton-Rosen Carlo Luschi 253 21 0 07 Jun 2021
Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training Dominic Masters A. Labatie Zach Eaton-Rosen Carlo Luschi 273 13 0 07 Jun 2021
Stochastic Whitening Batch NormalizationComputer Vision and Pattern Recognition (CVPR), 2021 Shengdong Zhang E. Nezhadarya H. Fashandi Jiayi Liu Darin Graham Mohak Shah 168 15 0 03 Jun 2021
Deep Unitary Convolutional Neural NetworksInternational Conference on Artificial Neural Networks (ICANN), 2021 Hao-Yuan Chang Kang L. Wang 120 2 0 23 Feb 2021
A Projection Algorithm for the Unitary Weights Hao-Yuan Chang 58 0 0 19 Feb 2021
Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning Zeming Li Songtao Liu Jian Sun 573 16 0 19 Jan 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics D. Kunin Javier Sagastuy-Breña Surya Ganguli Daniel L. K. Yamins Hidenori Tanaka 317 88 0 08 Dec 2020
Group Whitening: Balancing Learning Efficiency and Representational Capacity Lei Huang Yi Zhou Li Liu Fan Zhu Ling Shao 334 24 0 28 Sep 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020 Lei Huang Jie Qin Yi Zhou Fan Zhu Li Liu Ling Shao AI4CE 280 359 0 27 Sep 2020
Review: Deep Learning in Electron Microscopy Jeffrey M. Ede 764 88 0 17 Sep 2020
Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD Ruosi Wan Zhanxing Zhu Xiangyu Zhang Jian Sun 130 11 0 15 Jun 2020
Pipelined Backpropagation at Scale: Training Large Models without BatchesConference on Machine Learning and Systems (MLSys), 2020 Atli Kosson Vitaliy Chiley Abhinav Venigalla Joel Hestness Urs Koster 244 34 0 25 Mar 2020
Synaptic Metaplasticity in Binarized Neural NetworksNature Communications (Nat Commun), 2020 Axel Laborieux M. Ernoult T. Hirtzlin D. Querlioz CLL 190 72 0 07 Mar 2020
Batch norm with entropic regularization turns deterministic autoencoders into generative modelsConference on Uncertainty in Artificial Intelligence (UAI), 2020 Amur Ghose Abdullah M. Rashwan Pascal Poupart UQCV 196 8 0 25 Feb 2020
Towards Stabilizing Batch Statistics in Backward Propagation of Batch NormalizationInternational Conference on Learning Representations (ICLR), 2020 Junjie Yan Ruosi Wan Xinming Zhang Wei Zhang Yichen Wei Jian Sun 157 42 0 19 Jan 2020
The Origins and Prevalence of Texture Bias in Convolutional Neural Networks Katherine L. Hermann Ting Chen Simon Kornblith CVBM 269 21 0 20 Nov 2019