ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1902.08129
  4. Cited By
A Mean Field Theory of Batch Normalization

A Mean Field Theory of Batch Normalization

21 February 2019
Greg Yang
Jeffrey Pennington
Vinay Rao
Jascha Narain Sohl-Dickstein
S. Schoenholz
ArXivPDFHTML

Papers citing "A Mean Field Theory of Batch Normalization"

46 / 46 papers shown
Title
AADNet: Exploring EEG Spatiotemporal Information for Fast and Accurate Orientation and Timbre Detection of Auditory Attention Based on A Cue-Masked Paradigm
AADNet: Exploring EEG Spatiotemporal Information for Fast and Accurate Orientation and Timbre Detection of Auditory Attention Based on A Cue-Masked Paradigm
Keren Shi
Xu Liu
Xue Yuan
Haijie Shang
Ruiting Dai
Hanbin Wang
Yunfa Fu
N. Jiang
Jiayuan He
41
1
0
08 Jan 2025
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
47
3
0
29 May 2024
The Implicit Bias of Batch Normalization in Linear Models and Two-layer
  Linear Convolutional Neural Networks
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
Yuan Cao
Difan Zou
Yuan-Fang Li
Quanquan Gu
MLT
37
5
0
20 Jun 2023
On the Weight Dynamics of Deep Normalized Networks
On the Weight Dynamics of Deep Normalized Networks
Christian H. X. Ali Mehmeti-Göpel
Michael Wand
40
1
0
01 Jun 2023
The Disharmony between BN and ReLU Causes Gradient Explosion, but is
  Offset by the Correlation between Activations
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations
Inyoung Paik
Jaesik Choi
26
0
0
23 Apr 2023
Understanding plasticity in neural networks
Understanding plasticity in neural networks
Clare Lyle
Zeyu Zheng
Evgenii Nikishin
Bernardo Avila-Pires
Razvan Pascanu
Will Dabney
AI4CE
45
98
0
02 Mar 2023
On the Initialisation of Wide Low-Rank Feedforward Neural Networks
On the Initialisation of Wide Low-Rank Feedforward Neural Networks
Thiziri Nait Saada
Jared Tanner
13
1
0
31 Jan 2023
Batch Normalization Explained
Batch Normalization Explained
Randall Balestriero
Richard G. Baraniuk
AAML
36
16
0
29 Sep 2022
AutoInit: Automatic Initialization via Jacobian Tuning
AutoInit: Automatic Initialization via Jacobian Tuning
Tianyu He
Darshil Doshi
Andrey Gromov
24
4
0
27 Jun 2022
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Lily H. Zhang
Veronica Tozzo
J. Higgins
Rajesh Ranganath
BDL
MoE
19
16
0
23 Jun 2022
Fast Finite Width Neural Tangent Kernel
Fast Finite Width Neural Tangent Kernel
Roman Novak
Jascha Narain Sohl-Dickstein
S. Schoenholz
AAML
28
54
0
17 Jun 2022
Batch Normalization Is Blind to the First and Second Derivatives of the
  Loss
Batch Normalization Is Blind to the First and Second Derivatives of the Loss
Zhanpeng Zhou
Wen Shen
Huixin Chen
Ling Tang
Quanshi Zhang
39
2
0
30 May 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored
  Rectifiers
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
39
26
0
15 Mar 2022
Revisiting Batch Norm Initialization
Revisiting Batch Norm Initialization
Jim Davis
Logan Frank
22
4
0
26 Oct 2021
Lottery Tickets with Nonzero Biases
Lottery Tickets with Nonzero Biases
Jonas Fischer
Advait Gadhikar
R. Burkholz
27
6
0
21 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
36
35
0
08 Oct 2021
Batch Normalization Preconditioning for Neural Network Training
Batch Normalization Preconditioning for Neural Network Training
Susanna Lange
Kyle E. Helfrich
Qiang Ye
32
9
0
02 Aug 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
46
36
0
16 Apr 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
226
513
0
11 Feb 2021
Improving Unsupervised Domain Adaptation by Reducing Bi-level Feature
  Redundancy
Improving Unsupervised Domain Adaptation by Reducing Bi-level Feature Redundancy
Mengzhu Wang
Xiang Zhang
L. Lan
Wei Wang
Huibin Tan
Zhigang Luo
AI4CE
39
1
0
28 Dec 2020
BYOL works even without batch statistics
BYOL works even without batch statistics
Pierre Harvey Richemond
Jean-Bastien Grill
Florent Altché
Corentin Tallec
Florian Strub
...
Samuel L. Smith
Soham De
Razvan Pascanu
Bilal Piot
Michal Valko
SSL
250
114
0
20 Oct 2020
Group Whitening: Balancing Learning Efficiency and Representational
  Capacity
Group Whitening: Balancing Learning Efficiency and Representational Capacity
Lei Huang
Yi Zhou
Li Liu
Fan Zhu
Ling Shao
33
21
0
28 Sep 2020
Tensor Programs III: Neural Matrix Laws
Tensor Programs III: Neural Matrix Laws
Greg Yang
19
45
0
22 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
44
79
0
17 Sep 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu
Tri Dao
Stefano Ermon
Atri Rudra
Christopher Ré
54
492
0
17 Aug 2020
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural
  Network Initialization?
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?
Yaniv Blumenfeld
D. Gilboa
Daniel Soudry
ODL
30
13
0
02 Jul 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Greg Yang
58
135
0
25 Jun 2020
New Interpretations of Normalization Methods in Deep Learning
New Interpretations of Normalization Methods in Deep Learning
Jiacheng Sun
Xiangyong Cao
Hanwen Liang
Weiran Huang
Zewei Chen
Zhenguo Li
21
35
0
16 Jun 2020
Dynamical mean-field theory for stochastic gradient descent in Gaussian
  mixture classification
Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification
Francesca Mignacco
Florent Krzakala
Pierfrancesco Urbani
Lenka Zdeborová
MLT
17
66
0
10 Jun 2020
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random
  Features in CNNs
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs
Jonathan Frankle
D. Schwab
Ari S. Morcos
20
140
0
29 Feb 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function
  in Deep Networks
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
27
20
0
24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural
  Networks
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
55
155
0
21 Feb 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
59
953
0
12 Feb 2020
How Does BN Increase Collapsed Neural Network Filters?
How Does BN Increase Collapsed Neural Network Filters?
Sheng Zhou
Xinjiang Wang
Ping Luo
Xue Jiang
Wenjie Li
Wei Zhang
21
1
0
30 Jan 2020
A Comprehensive and Modularized Statistical Framework for Gradient Norm
  Equality in Deep Neural Networks
A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks
Zhaodong Chen
Lei Deng
Bangyan Wang
Guoqi Li
Yuan Xie
37
28
0
01 Jan 2020
Towards Efficient Training for Neural Network Quantization
Towards Efficient Training for Neural Network Quantization
Qing Jin
Linjie Yang
Zhenyu A. Liao
MQ
21
42
0
21 Dec 2019
Mean field theory for deep dropout networks: digging up gradient
  backpropagation deeply
Mean field theory for deep dropout networks: digging up gradient backpropagation deeply
Wei Huang
R. Xu
Weitao Du
Yutian Zeng
Yunce Zhao
30
6
0
19 Dec 2019
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any
  Architecture are Gaussian Processes
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
Greg Yang
33
194
0
28 Oct 2019
Switchable Normalization for Learning-to-Normalize Deep Representation
Switchable Normalization for Learning-to-Normalize Deep Representation
Ping Luo
Ruimao Zhang
Jiamin Ren
Zhanglin Peng
Jingyu Li
30
73
0
22 Jul 2019
A Signal Propagation Perspective for Pruning Neural Networks at
  Initialization
A Signal Propagation Perspective for Pruning Neural Networks at Initialization
Namhoon Lee
Thalaiyasingam Ajanthan
Stephen Gould
Philip Torr
AAML
32
152
0
14 Jun 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide
  Neural Networks
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks
Ryo Karakida
S. Akaho
S. Amari
27
40
0
07 Jun 2019
Micro-Batch Training with Batch-Channel Normalization and Weight
  Standardization
Micro-Batch Training with Batch-Channel Normalization and Weight Standardization
Siyuan Qiao
Huiyu Wang
Chenxi Liu
Wei Shen
Alan Yuille
MQ
32
144
0
25 Mar 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient
  Descent
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
57
1,080
0
18 Feb 2019
Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
D. Gilboa
B. Chang
Minmin Chen
Greg Yang
S. Schoenholz
Ed H. Chi
Jeffrey Pennington
34
40
0
25 Jan 2019
Information Geometry of Orthogonal Initializations and Training
Information Geometry of Orthogonal Initializations and Training
Piotr A. Sokól
Il-Su Park
AI4CE
82
16
0
09 Oct 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train
  10,000-Layer Vanilla Convolutional Neural Networks
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
244
350
0
14 Jun 2018
1