A Mean Field Theory of Batch Normalization

21 February 2019

Greg Yang

Jeffrey Pennington

Vinay Rao

Jascha Narain Sohl-Dickstein

S. Schoenholz

ArXiv PDF HTML

Papers citing "A Mean Field Theory of Batch Normalization"

46 / 46 papers shown

Title
AADNet: Exploring EEG Spatiotemporal Information for Fast and Accurate Orientation and Timbre Detection of Auditory Attention Based on A Cue-Masked Paradigm Keren Shi Xu Liu Xue Yuan Haijie Shang Ruiting Dai Hanbin Wang Yunfa Fu N. Jiang Jiayuan He 41 1 0 08 Jan 2025
Understanding and Minimising Outlier Features in Neural Network Training Bobby He Lorenzo Noci Daniele Paliotta Imanol Schlag Thomas Hofmann 47 3 0 29 May 2024
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks Yuan Cao Difan Zou Yuan-Fang Li Quanquan Gu MLT 37 5 0 20 Jun 2023
On the Weight Dynamics of Deep Normalized Networks Christian H. X. Ali Mehmeti-Göpel Michael Wand 40 1 0 01 Jun 2023
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations Inyoung Paik Jaesik Choi 26 0 0 23 Apr 2023
Understanding plasticity in neural networks Clare Lyle Zeyu Zheng Evgenii Nikishin Bernardo Avila-Pires Razvan Pascanu Will Dabney AI4CE 45 98 0 02 Mar 2023
On the Initialisation of Wide Low-Rank Feedforward Neural Networks Thiziri Nait Saada Jared Tanner 13 1 0 31 Jan 2023
Batch Normalization Explained Randall Balestriero Richard G. Baraniuk AAML 36 16 0 29 Sep 2022
AutoInit: Automatic Initialization via Jacobian Tuning Tianyu He Darshil Doshi Andrey Gromov 24 4 0 27 Jun 2022
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets Lily H. Zhang Veronica Tozzo J. Higgins Rajesh Ranganath BDL MoE 19 16 0 23 Jun 2022
Fast Finite Width Neural Tangent Kernel Roman Novak Jascha Narain Sohl-Dickstein S. Schoenholz AAML 28 54 0 17 Jun 2022
Batch Normalization Is Blind to the First and Second Derivatives of the Loss Zhanpeng Zhou Wen Shen Huixin Chen Ling Tang Quanshi Zhang 39 2 0 30 May 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers Guodong Zhang Aleksandar Botev James Martens OffRL 39 26 0 15 Mar 2022
Revisiting Batch Norm Initialization Jim Davis Logan Frank 22 4 0 26 Oct 2021
Lottery Tickets with Nonzero Biases Jonas Fischer Advait Gadhikar R. Burkholz 27 6 0 21 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning Justin Gilmer Behrooz Ghorbani Ankush Garg Sneha Kudugunta Behnam Neyshabur David E. Cardoze George E. Dahl Zachary Nado Orhan Firat ODL 36 35 0 08 Oct 2021
Batch Normalization Preconditioning for Neural Network Training Susanna Lange Kyle E. Helfrich Qiang Ye 32 9 0 02 Aug 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization Tianlong Chen Zhenyu Zhang Xu Ouyang Zechun Liu Zhiqiang Shen Zhangyang Wang MQ 46 36 0 16 Apr 2021
High-Performance Large-Scale Image Recognition Without Normalization Andrew Brock Soham De Samuel L. Smith Karen Simonyan VLM 226 513 0 11 Feb 2021
Improving Unsupervised Domain Adaptation by Reducing Bi-level Feature Redundancy Mengzhu Wang Xiang Zhang L. Lan Wei Wang Huibin Tan Zhigang Luo AI4CE 39 1 0 28 Dec 2020
BYOL works even without batch statistics Pierre Harvey Richemond Jean-Bastien Grill Florent Altché Corentin Tallec Florian Strub ... Samuel L. Smith Soham De Razvan Pascanu Bilal Piot Michal Valko SSL 250 114 0 20 Oct 2020
Group Whitening: Balancing Learning Efficiency and Representational Capacity Lei Huang Yi Zhou Li Liu Fan Zhu Ling Shao 33 21 0 28 Sep 2020
Tensor Programs III: Neural Matrix Laws Greg Yang 19 45 0 22 Sep 2020
Review: Deep Learning in Electron Microscopy Jeffrey M. Ede 44 79 0 17 Sep 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections Albert Gu Tri Dao Stefano Ermon Atri Rudra Christopher Ré 54 492 0 17 Aug 2020
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization? Yaniv Blumenfeld D. Gilboa Daniel Soudry ODL 30 13 0 02 Jul 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture Greg Yang 58 135 0 25 Jun 2020
New Interpretations of Normalization Methods in Deep Learning Jiacheng Sun Xiangyong Cao Hanwen Liang Weiran Huang Zewei Chen Zhenguo Li 21 35 0 16 Jun 2020
Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification Francesca Mignacco Florent Krzakala Pierfrancesco Urbani Lenka Zdeborová MLT 17 66 0 10 Jun 2020
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs Jonathan Frankle D. Schwab Ari S. Morcos 20 140 0 29 Feb 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Soham De Samuel L. Smith ODL 27 20 0 24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 55 155 0 21 Feb 2020
On Layer Normalization in the Transformer Architecture Ruibin Xiong Yunchang Yang Di He Kai Zheng Shuxin Zheng Chen Xing Huishuai Zhang Yanyan Lan Liwei Wang Tie-Yan Liu AI4CE 59 953 0 12 Feb 2020
How Does BN Increase Collapsed Neural Network Filters? Sheng Zhou Xinjiang Wang Ping Luo Xue Jiang Wenjie Li Wei Zhang 21 1 0 30 Jan 2020
A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks Zhaodong Chen Lei Deng Bangyan Wang Guoqi Li Yuan Xie 37 28 0 01 Jan 2020
Towards Efficient Training for Neural Network Quantization Qing Jin Linjie Yang Zhenyu A. Liao MQ 21 42 0 21 Dec 2019
Mean field theory for deep dropout networks: digging up gradient backpropagation deeply Wei Huang R. Xu Weitao Du Yutian Zeng Yunce Zhao 30 6 0 19 Dec 2019
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes Greg Yang 33 194 0 28 Oct 2019
Switchable Normalization for Learning-to-Normalize Deep Representation Ping Luo Ruimao Zhang Jiamin Ren Zhanglin Peng Jingyu Li 30 73 0 22 Jul 2019
A Signal Propagation Perspective for Pruning Neural Networks at Initialization Namhoon Lee Thalaiyasingam Ajanthan Stephen Gould Philip Torr AAML 32 152 0 14 Jun 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks Ryo Karakida S. Akaho S. Amari 27 40 0 07 Jun 2019
Micro-Batch Training with Batch-Channel Normalization and Weight Standardization Siyuan Qiao Huiyu Wang Chenxi Liu Wei Shen Alan Yuille MQ 32 144 0 25 Mar 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent Jaehoon Lee Lechao Xiao S. Schoenholz Yasaman Bahri Roman Novak Jascha Narain Sohl-Dickstein Jeffrey Pennington 57 1,080 0 18 Feb 2019
Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs D. Gilboa B. Chang Minmin Chen Greg Yang S. Schoenholz Ed H. Chi Jeffrey Pennington 34 40 0 25 Jan 2019
Information Geometry of Orthogonal Initializations and Training Piotr A. Sokól Il-Su Park AI4CE 82 16 0 09 Oct 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks Lechao Xiao Yasaman Bahri Jascha Narain Sohl-Dickstein S. Schoenholz Jeffrey Pennington 244 350 0 14 Jun 2018