ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1902.08129
  4. Cited By
A Mean Field Theory of Batch Normalization
v1v2 (latest)

A Mean Field Theory of Batch Normalization

21 February 2019
Greg Yang
Jeffrey Pennington
Vinay Rao
Jascha Narain Sohl-Dickstein
S. Schoenholz
ArXiv (abs)PDFHTML

Papers citing "A Mean Field Theory of Batch Normalization"

50 / 115 papers shown
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Yongliang Wu
Y. Zhou
Zhou Ziheng
Yingzhe Peng
Xinyu Ye
Xinting Hu
Wenbo Zhu
Lu Qi
Ming-Hsuan Yang
Xu Yang
OffRLLRM
327
76
0
07 Aug 2025
ResNets Are Deeper Than You Think
ResNets Are Deeper Than You Think
Christian H.X. Ali Mehmeti-Göpel
Michael Wand
258
2
0
17 Jun 2025
On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning
On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning
Alvaro Arroyo
Alessio Gravina
Benjamin Gutteridge
Federico Barbero
Claudio Gallicchio
Xiaowen Dong
Michael M. Bronstein
P. Vandergheynst
399
41
0
15 Feb 2025
AADNet: Exploring EEG Spatiotemporal Information for Fast and Accurate Orientation and Timbre Detection of Auditory Attention Based on A Cue-Masked Paradigm
AADNet: Exploring EEG Spatiotemporal Information for Fast and Accurate Orientation and Timbre Detection of Auditory Attention Based on A Cue-Masked ParadigmIEEE transactions on neural systems and rehabilitation engineering (TNSRE), 2025
Keren Shi
Xu Liu
Xue Yuan
Haijie Shang
Ruiting Dai
Hanbin Wang
Yunfa Fu
N. Jiang
Jiayuan He
191
3
0
08 Jan 2025
Emergence of Globally Attracting Fixed Points in Deep Neural Networks
  With Nonlinear Activations
Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear ActivationsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Amir Joudaki
Thomas Hofmann
MLT
259
0
0
26 Oct 2024
Towards the Spectral bias Alleviation by Normalizations in Coordinate
  Networks
Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks
Zhicheng Cai
Hao Zhu
Qiu Shen
Xinran Wang
Xun Cao
416
6
0
25 Jul 2024
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Michael Scholkemper
Xinyi Wu
Ali Jadbabaie
Michael T. Schaub
647
29
0
05 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
376
11
0
29 May 2024
LayerNorm: A key component in parameter-efficient fine-tuning
LayerNorm: A key component in parameter-efficient fine-tuning
Taha ValizadehAslani
Hualou Liang
314
5
0
29 Mar 2024
Deep Neural Network Initialization with Sparsity Inducing Activations
Deep Neural Network Initialization with Sparsity Inducing Activations
Ilan Price
Nicholas Daultry Ball
Samuel C.H. Lam
Adam C. Jones
Jared Tanner
AI4CE
302
3
0
25 Feb 2024
A2Q+: Improving Accumulator-Aware Weight Quantization
A2Q+: Improving Accumulator-Aware Weight Quantization
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
Yaman Umuroglu
MQ
262
11
0
19 Jan 2024
Unified Batch Normalization: Identifying and Alleviating the Feature
  Condensation in Batch Normalization and a Unified Framework
Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework
Shaobo Wang
Xiangdong Zhang
Dongrui Liu
Junchi Yan
411
1
0
27 Nov 2023
Simplifying Transformer Blocks
Simplifying Transformer BlocksInternational Conference on Learning Representations (ICLR), 2023
Bobby He
Thomas Hofmann
506
51
0
03 Nov 2023
Towards Training Without Depth Limits: Batch Normalization Without
  Gradient Explosion
Towards Training Without Depth Limits: Batch Normalization Without Gradient ExplosionInternational Conference on Learning Representations (ICLR), 2023
Alexandru Meterez
Amir Joudaki
Francesco Orabona
Alexander Immer
Gunnar Rätsch
Hadi Daneshmand
258
10
0
03 Oct 2023
Towards Understanding Neural Collapse: The Effects of Batch
  Normalization and Weight Decay
Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay
Leyan Pan
Xinyuan Cao
468
9
0
09 Sep 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer
  Linear Convolutional Neural Networks
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural NetworksAnnual Conference Computational Learning Theory (COLT), 2023
Yuan Cao
Difan Zou
Yuan-Fang Li
Quanquan Gu
MLT
363
7
0
20 Jun 2023
On the Weight Dynamics of Deep Normalized Networks
On the Weight Dynamics of Deep Normalized NetworksInternational Conference on Machine Learning (ICML), 2023
Christian H. X. Ali Mehmeti-Göpel
Michael Wand
411
4
0
01 Jun 2023
On the impact of activation and normalization in obtaining isometric
  embeddings at initialization
On the impact of activation and normalization in obtaining isometric embeddings at initializationNeural Information Processing Systems (NeurIPS), 2023
Amir Joudaki
Hadi Daneshmand
Francis R. Bach
328
13
0
28 May 2023
The Disharmony between BN and ReLU Causes Gradient Explosion, but is
  Offset by the Correlation between Activations
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations
Inyoung Paik
Jaesik Choi
448
2
0
23 Apr 2023
Picking Up Quantization Steps for Compressed Image Classification
Picking Up Quantization Steps for Compressed Image Classification
Li Ma
Peixi Peng
Guangyao Chen
Yifan Zhao
Siwei Dong
Yonghong Tian
199
5
0
21 Apr 2023
Making Batch Normalization Great in Federated Deep Learning
Making Batch Normalization Great in Federated Deep Learning
Shitian Zhao
Hong-You Chen
Wei-Lun Chao
FedML
646
13
0
12 Mar 2023
Understanding plasticity in neural networks
Understanding plasticity in neural networksInternational Conference on Machine Learning (ICML), 2023
Clare Lyle
Zeyu Zheng
Evgenii Nikishin
Bernardo Avila-Pires
Razvan Pascanu
Will Dabney
AI4CE
626
156
0
02 Mar 2023
The Expressive Power of Tuning Only the Normalization Layers
The Expressive Power of Tuning Only the Normalization LayersAnnual Conference Computational Learning Theory (COLT), 2023
Angeliki Giannou
Shashank Rajput
Dimitris Papailiopoulos
232
13
0
15 Feb 2023
On the Initialisation of Wide Low-Rank Feedforward Neural Networks
On the Initialisation of Wide Low-Rank Feedforward Neural Networks
Thiziri Nait Saada
Jared Tanner
295
2
0
31 Jan 2023
Batch Normalization Explained
Batch Normalization Explained
Randall Balestriero
Richard G. Baraniuk
AAML
234
23
0
29 Sep 2022
Deep Maxout Network Gaussian Process
Deep Maxout Network Gaussian Process
Libin Liang
Ye Tian
Ge Cheng
BDL
133
0
0
08 Aug 2022
AutoInit: Automatic Initialization via Jacobian Tuning
AutoInit: Automatic Initialization via Jacobian Tuning
Tianyu He
Darshil Doshi
Andrey Gromov
321
4
0
27 Jun 2022
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep SetsInternational Conference on Machine Learning (ICML), 2022
Lily H. Zhang
Veronica Tozzo
J. Higgins
Rajesh Ranganath
BDLMoE
314
28
0
23 Jun 2022
Fast Finite Width Neural Tangent Kernel
Fast Finite Width Neural Tangent KernelInternational Conference on Machine Learning (ICML), 2022
Roman Novak
Jascha Narain Sohl-Dickstein
S. Schoenholz
AAML
275
72
0
17 Jun 2022
Batch Normalization Is Blind to the First and Second Derivatives of the
  Loss
Batch Normalization Is Blind to the First and Second Derivatives of the LossAAAI Conference on Artificial Intelligence (AAAI), 2022
Zhanpeng Zhou
Wen Shen
Huixin Chen
Ling Tang
Quanshi Zhang
216
4
0
30 May 2022
On Bridging the Gap between Mean Field and Finite Width in Deep Random
  Neural Networks with Batch Normalization
On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization
Amir Joudaki
Hadi Daneshmand
Francis R. Bach
AI4CE
316
2
0
25 May 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored
  Rectifiers
Deep Learning without Shortcuts: Shaping the Kernel with Tailored RectifiersInternational Conference on Learning Representations (ICLR), 2022
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
286
30
0
15 Mar 2022
Training BatchNorm Only in Neural Architecture Search and Beyond
Training BatchNorm Only in Neural Architecture Search and Beyond
Yichen Zhu
Jie Du
Yuqin Zhu
Yi Wang
Zhicai Ou
Feifei Feng
Jian Tang
338
1
0
01 Dec 2021
Critical Initialization of Wide and Deep Neural Networks through Partial
  Jacobians: General Theory and Applications
Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications
Darshil Doshi
Tianyu He
Andrey Gromov
436
10
0
23 Nov 2021
Revisiting Batch Norm Initialization
Revisiting Batch Norm Initialization
Jim Davis
Logan Frank
198
5
0
26 Oct 2021
Lottery Tickets with Nonzero Biases
Lottery Tickets with Nonzero Biases
Jonas Fischer
Advait Gadhikar
R. Burkholz
217
8
0
21 Oct 2021
UniFed: A Unified Framework for Federated Learning on Non-IID Image
  Features
UniFed: A Unified Framework for Federated Learning on Non-IID Image Features
Meirui Jiang
Xiaoxiao Li
Xiaofei Zhang
Michael Kamp
Qianming Dou
FedMLOOD
326
1
0
19 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
196
42
0
08 Oct 2021
Batch Normalization Preconditioning for Neural Network Training
Batch Normalization Preconditioning for Neural Network Training
Susanna Lange
Kyle E. Helfrich
Qiang Ye
248
17
0
02 Aug 2021
On the Periodic Behavior of Neural Network Training with Batch
  Normalization and Weight Decay
On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight DecayNeural Information Processing Systems (NeurIPS), 2021
E. Lobacheva
M. Kodryan
Nadezhda Chirkova
A. Malinin
Dmitry Vetrov
441
27
0
29 Jun 2021
Beyond BatchNorm: Towards a Unified Understanding of Normalization in
  Deep Learning
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep LearningNeural Information Processing Systems (NeurIPS), 2021
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
416
47
0
10 Jun 2021
Batch Normalization Orthogonalizes Representations in Deep Random
  Networks
Batch Normalization Orthogonalizes Representations in Deep Random NetworksNeural Information Processing Systems (NeurIPS), 2021
Hadi Daneshmand
Amir Joudaki
Francis R. Bach
OOD
174
43
0
07 Jun 2021
Vanishing Curvature and the Power of Adaptive Methods in Randomly
  Initialized Deep Networks
Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks
Antonio Orvieto
Jonas Köhler
Dario Pavllo
Thomas Hofmann
Aurelien Lucchi
ODL
196
6
0
07 Jun 2021
Proxy-Normalizing Activations to Match Batch Normalization while
  Removing Batch Dependence
Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch DependenceNeural Information Processing Systems (NeurIPS), 2021
A. Labatie
Dominic Masters
Zach Eaton-Rosen
Carlo Luschi
511
21
0
07 Jun 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zinan Lin
MQ
257
42
0
16 Apr 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without NormalizationInternational Conference on Machine Learning (ICML), 2021
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
623
612
0
11 Feb 2021
Characterizing signal propagation to close the performance gap in
  unnormalized ResNets
Characterizing signal propagation to close the performance gap in unnormalized ResNetsInternational Conference on Learning Representations (ICLR), 2021
Andrew Brock
Soham De
Samuel L. Smith
553
138
0
21 Jan 2021
Improving Unsupervised Domain Adaptation by Reducing Bi-level Feature
  Redundancy
Improving Unsupervised Domain Adaptation by Reducing Bi-level Feature Redundancy
Mengzhu Wang
Xiang Zhang
L. Lan
Wei Wang
Huibin Tan
Zhigang Luo
AI4CE
224
2
0
28 Dec 2020
Batch Group Normalization
Batch Group Normalization
Xiao-Yun Zhou
Jiacheng Sun
Nanyang Ye
Xu Lan
Qijun Luo
Bolin Lai
P. Esperança
Guang-Zhong Yang
Zhenguo Li
433
20
0
04 Dec 2020
Feature Learning in Infinite-Width Neural Networks
Feature Learning in Infinite-Width Neural Networks
Greg Yang
J. E. Hu
MLT
565
194
0
30 Nov 2020
123
Next
Page 1 of 3