ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.05894
  4. Cited By
Online Normalization for Training Neural Networks
v1v2v3 (latest)

Online Normalization for Training Neural Networks

Neural Information Processing Systems (NeurIPS), 2019
15 May 2019
Vitaliy Chiley
I. Sharapov
Atli Kosson
Urs Koster
R. Reece
S. D. L. Fuente
Vishal Subbiah
Michael James
    OnRL
ArXiv (abs)PDFHTML

Papers citing "Online Normalization for Training Neural Networks"

35 / 35 papers shown
Title
Weight Decay may matter more than muP for Learning Rate Transfer in Practice
Weight Decay may matter more than muP for Learning Rate Transfer in Practice
Atli Kosson
Jeremy Welborn
Yang Liu
Martin Jaggi
Xi Chen
48
1
0
21 Oct 2025
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
Aleksandr Dremov
Alexander Hägele
Atli Kosson
Martin Jaggi
124
1
0
02 Aug 2025
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
299
13
0
19 May 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
352
0
0
22 Apr 2025
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Analyzing & Reducing the Need for Learning Rate Warmup in GPT TrainingNeural Information Processing Systems (NeurIPS), 2024
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
204
14
0
31 Oct 2024
Unified Batch Normalization: Identifying and Alleviating the Feature
  Condensation in Batch Normalization and a Unified Framework
Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework
Shaobo Wang
Xiangdong Zhang
Dongrui Liu
Junchi Yan
263
1
0
27 Nov 2023
Maintaining Plasticity in Deep Continual Learning
Maintaining Plasticity in Deep Continual Learning
Shibhansh Dohare
J. F. Hernandez-Garcia
Parash Rahman
A. Rupam Mahmood
Richard S. Sutton
KELMCLL
305
36
0
23 Jun 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural
  Networks
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Atli Kosson
Bettina Messmer
Martin Jaggi
395
28
0
26 May 2023
Ghost Noise for Regularizing Deep Neural Networks
Ghost Noise for Regularizing Deep Neural NetworksAAAI Conference on Artificial Intelligence (AAAI), 2023
Atli Kosson
Dongyang Fan
Martin Jaggi
230
2
0
26 May 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time
  Gradient Descent and Discretization Error Analysis
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error AnalysisNeural Information Processing Systems (NeurIPS), 2022
Taiki Miyagawa
207
10
0
28 Oct 2022
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR
  Prediction
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction
Wenlong Deng
Lang Lang
Ziqiang Liu
B. Liu
152
0
0
09 Oct 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in
  Three Regimes
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three RegimesNeural Information Processing Systems (NeurIPS), 2022
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
237
19
0
08 Sep 2022
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid NetworkConference on Machine Learning and Systems (MLSys), 2022
Vitaliy Chiley
Vithursan Thangarasa
Abhay Gupta
Anshul Samar
Joel Hestness
D. DeCoste
137
12
0
28 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness ReductionNeural Information Processing Systems (NeurIPS), 2022
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
231
86
0
14 Jun 2022
Delving into the Estimation Shift of Batch Normalization in a Network
Delving into the Estimation Shift of Batch Normalization in a NetworkComputer Vision and Pattern Recognition (CVPR), 2022
Lei Huang
Yi Zhou
Tian Wang
Jie Luo
Xianglong Liu
BDL
188
25
0
21 Mar 2022
One model to enhance them all: array geometry agnostic multi-channel
  personalized speech enhancement
One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
H. Taherian
Sefik Emre Eskimez
Takuya Yoshioka
Huaming Wang
Zhuo Chen
Xuedong Huang
150
24
0
20 Oct 2021
Continual Backprop: Stochastic Gradient Descent with Persistent
  Randomness
Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
Shibhansh Dohare
R. Sutton
A. R. Mahmood
CLL
258
90
0
13 Aug 2021
On the Periodic Behavior of Neural Network Training with Batch
  Normalization and Weight Decay
On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight DecayNeural Information Processing Systems (NeurIPS), 2021
E. Lobacheva
M. Kodryan
Nadezhda Chirkova
A. Malinin
Dmitry Vetrov
258
27
0
29 Jun 2021
Beyond BatchNorm: Towards a Unified Understanding of Normalization in
  Deep Learning
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep LearningNeural Information Processing Systems (NeurIPS), 2021
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
259
44
0
10 Jun 2021
Proxy-Normalizing Activations to Match Batch Normalization while
  Removing Batch Dependence
Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch DependenceNeural Information Processing Systems (NeurIPS), 2021
A. Labatie
Dominic Masters
Zach Eaton-Rosen
Carlo Luschi
253
21
0
07 Jun 2021
Making EfficientNet More Efficient: Exploring Batch-Independent
  Normalization, Group Convolutions and Reduced Resolution Training
Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training
Dominic Masters
A. Labatie
Zach Eaton-Rosen
Carlo Luschi
273
13
0
07 Jun 2021
Stochastic Whitening Batch Normalization
Stochastic Whitening Batch NormalizationComputer Vision and Pattern Recognition (CVPR), 2021
Shengdong Zhang
E. Nezhadarya
H. Fashandi
Jiayi Liu
Darin Graham
Mohak Shah
168
15
0
03 Jun 2021
Deep Unitary Convolutional Neural Networks
Deep Unitary Convolutional Neural NetworksInternational Conference on Artificial Neural Networks (ICANN), 2021
Hao-Yuan Chang
Kang L. Wang
120
2
0
23 Feb 2021
A Projection Algorithm for the Unitary Weights
A Projection Algorithm for the Unitary Weights
Hao-Yuan Chang
58
0
0
19 Feb 2021
Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for
  Self-Supervised Learning
Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning
Zeming Li
Songtao Liu
Jian Sun
573
16
0
19 Jan 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
  Dynamics
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
317
88
0
08 Dec 2020
Group Whitening: Balancing Learning Efficiency and Representational
  Capacity
Group Whitening: Balancing Learning Efficiency and Representational Capacity
Lei Huang
Yi Zhou
Li Liu
Fan Zhu
Ling Shao
334
24
0
28 Sep 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and
  Application
Normalization Techniques in Training DNNs: Methodology, Analysis and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Lei Huang
Jie Qin
Yi Zhou
Fan Zhu
Li Liu
Ling Shao
AI4CE
280
359
0
27 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
764
88
0
17 Sep 2020
Spherical Motion Dynamics: Learning Dynamics of Neural Network with
  Normalization, Weight Decay, and SGD
Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD
Ruosi Wan
Zhanxing Zhu
Xiangyu Zhang
Jian Sun
130
11
0
15 Jun 2020
Pipelined Backpropagation at Scale: Training Large Models without
  Batches
Pipelined Backpropagation at Scale: Training Large Models without BatchesConference on Machine Learning and Systems (MLSys), 2020
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
244
34
0
25 Mar 2020
Synaptic Metaplasticity in Binarized Neural Networks
Synaptic Metaplasticity in Binarized Neural NetworksNature Communications (Nat Commun), 2020
Axel Laborieux
M. Ernoult
T. Hirtzlin
D. Querlioz
CLL
190
72
0
07 Mar 2020
Batch norm with entropic regularization turns deterministic autoencoders
  into generative models
Batch norm with entropic regularization turns deterministic autoencoders into generative modelsConference on Uncertainty in Artificial Intelligence (UAI), 2020
Amur Ghose
Abdullah M. Rashwan
Pascal Poupart
UQCV
196
8
0
25 Feb 2020
Towards Stabilizing Batch Statistics in Backward Propagation of Batch
  Normalization
Towards Stabilizing Batch Statistics in Backward Propagation of Batch NormalizationInternational Conference on Learning Representations (ICLR), 2020
Junjie Yan
Ruosi Wan
Xinming Zhang
Wei Zhang
Yichen Wei
Jian Sun
157
42
0
19 Jan 2020
The Origins and Prevalence of Texture Bias in Convolutional Neural
  Networks
The Origins and Prevalence of Texture Bias in Convolutional Neural Networks
Katherine L. Hermann
Ting Chen
Simon Kornblith
CVBM
269
21
0
20 Nov 2019
1