Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.05894
Cited By
v1
v2
v3 (latest)
Online Normalization for Training Neural Networks
Neural Information Processing Systems (NeurIPS), 2019
15 May 2019
Vitaliy Chiley
I. Sharapov
Atli Kosson
Urs Koster
R. Reece
S. D. L. Fuente
Vishal Subbiah
Michael James
OnRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Online Normalization for Training Neural Networks"
35 / 35 papers shown
Title
Weight Decay may matter more than muP for Learning Rate Transfer in Practice
Atli Kosson
Jeremy Welborn
Yang Liu
Martin Jaggi
Xi Chen
48
1
0
21 Oct 2025
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
Aleksandr Dremov
Alexander Hägele
Atli Kosson
Martin Jaggi
124
1
0
02 Aug 2025
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
299
13
0
19 May 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
352
0
0
22 Apr 2025
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Neural Information Processing Systems (NeurIPS), 2024
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
204
14
0
31 Oct 2024
Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework
Shaobo Wang
Xiangdong Zhang
Dongrui Liu
Junchi Yan
263
1
0
27 Nov 2023
Maintaining Plasticity in Deep Continual Learning
Shibhansh Dohare
J. F. Hernandez-Garcia
Parash Rahman
A. Rupam Mahmood
Richard S. Sutton
KELM
CLL
305
36
0
23 Jun 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
International Conference on Machine Learning (ICML), 2023
Atli Kosson
Bettina Messmer
Martin Jaggi
395
28
0
26 May 2023
Ghost Noise for Regularizing Deep Neural Networks
AAAI Conference on Artificial Intelligence (AAAI), 2023
Atli Kosson
Dongyang Fan
Martin Jaggi
230
2
0
26 May 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Neural Information Processing Systems (NeurIPS), 2022
Taiki Miyagawa
207
10
0
28 Oct 2022
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction
Wenlong Deng
Lang Lang
Ziqiang Liu
B. Liu
152
0
0
09 Oct 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
Neural Information Processing Systems (NeurIPS), 2022
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
237
19
0
08 Sep 2022
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Conference on Machine Learning and Systems (MLSys), 2022
Vitaliy Chiley
Vithursan Thangarasa
Abhay Gupta
Anshul Samar
Joel Hestness
D. DeCoste
137
12
0
28 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Neural Information Processing Systems (NeurIPS), 2022
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
231
86
0
14 Jun 2022
Delving into the Estimation Shift of Batch Normalization in a Network
Computer Vision and Pattern Recognition (CVPR), 2022
Lei Huang
Yi Zhou
Tian Wang
Jie Luo
Xianglong Liu
BDL
188
25
0
21 Mar 2022
One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
H. Taherian
Sefik Emre Eskimez
Takuya Yoshioka
Huaming Wang
Zhuo Chen
Xuedong Huang
150
24
0
20 Oct 2021
Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
Shibhansh Dohare
R. Sutton
A. R. Mahmood
CLL
258
90
0
13 Aug 2021
On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay
Neural Information Processing Systems (NeurIPS), 2021
E. Lobacheva
M. Kodryan
Nadezhda Chirkova
A. Malinin
Dmitry Vetrov
258
27
0
29 Jun 2021
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning
Neural Information Processing Systems (NeurIPS), 2021
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
259
44
0
10 Jun 2021
Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence
Neural Information Processing Systems (NeurIPS), 2021
A. Labatie
Dominic Masters
Zach Eaton-Rosen
Carlo Luschi
253
21
0
07 Jun 2021
Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training
Dominic Masters
A. Labatie
Zach Eaton-Rosen
Carlo Luschi
273
13
0
07 Jun 2021
Stochastic Whitening Batch Normalization
Computer Vision and Pattern Recognition (CVPR), 2021
Shengdong Zhang
E. Nezhadarya
H. Fashandi
Jiayi Liu
Darin Graham
Mohak Shah
168
15
0
03 Jun 2021
Deep Unitary Convolutional Neural Networks
International Conference on Artificial Neural Networks (ICANN), 2021
Hao-Yuan Chang
Kang L. Wang
120
2
0
23 Feb 2021
A Projection Algorithm for the Unitary Weights
Hao-Yuan Chang
58
0
0
19 Feb 2021
Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning
Zeming Li
Songtao Liu
Jian Sun
573
16
0
19 Jan 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
317
88
0
08 Dec 2020
Group Whitening: Balancing Learning Efficiency and Representational Capacity
Lei Huang
Yi Zhou
Li Liu
Fan Zhu
Ling Shao
334
24
0
28 Sep 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Lei Huang
Jie Qin
Yi Zhou
Fan Zhu
Li Liu
Ling Shao
AI4CE
280
359
0
27 Sep 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
764
88
0
17 Sep 2020
Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD
Ruosi Wan
Zhanxing Zhu
Xiangyu Zhang
Jian Sun
130
11
0
15 Jun 2020
Pipelined Backpropagation at Scale: Training Large Models without Batches
Conference on Machine Learning and Systems (MLSys), 2020
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
244
34
0
25 Mar 2020
Synaptic Metaplasticity in Binarized Neural Networks
Nature Communications (Nat Commun), 2020
Axel Laborieux
M. Ernoult
T. Hirtzlin
D. Querlioz
CLL
190
72
0
07 Mar 2020
Batch norm with entropic regularization turns deterministic autoencoders into generative models
Conference on Uncertainty in Artificial Intelligence (UAI), 2020
Amur Ghose
Abdullah M. Rashwan
Pascal Poupart
UQCV
196
8
0
25 Feb 2020
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization
International Conference on Learning Representations (ICLR), 2020
Junjie Yan
Ruosi Wan
Xinming Zhang
Wei Zhang
Yichen Wei
Jian Sun
157
42
0
19 Jan 2020
The Origins and Prevalence of Texture Bias in Convolutional Neural Networks
Katherine L. Hermann
Ting Chen
Simon Kornblith
CVBM
269
21
0
20 Nov 2019
1