Mean Field Residual Networks: On the Edge of Chaos

Neural Information Processing Systems (NeurIPS), 2017

24 December 2017

Greg Yang

S. Schoenholz

ArXiv (abs)PDF HTML

Papers citing "Mean Field Residual Networks: On the Edge of Chaos"

50 / 130 papers shown

Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning

103

09 Oct 2025

Arithmetic-Mean

μ

P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets

123

05 Oct 2025

Toward a Physics of Deep Learning and Brains

Arsham Ghavasieh

Meritxell Vila-Minana

26 Sep 2025

ResNets Are Deeper Than You Think

Christian H.X. Ali Mehmeti-Göpel

Michael Wand

184

17 Jun 2025

Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer

441

01 Jun 2025

Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation

Alessio Giorlandino

Sebastian Goldt

283

30 May 2025

GradAlign for Training-free Model Performance Inference

Yuxuan Li

Yunhui Guo

264

29 Nov 2024

Generalized Probabilistic Attention Mechanism in Transformers

DongNyeong Heo

Heeyoul Choi

277

21 Oct 2024

Collective variables of neural networks: empirical time evolution and scaling laws

Sven Krippendorf

196

09 Oct 2024

UnitNorm: Rethinking Normalization for Transformers in Time Series

Nan Huang

C. Kümmerle

Xiang Zhang

AI4TS

238

24 May 2024

Principled Architecture-aware Scaling of Hyperparameters

298

27 Feb 2024

Deep Neural Network Initialization with Sparsity Inducing Activations

Ilan Price

Nicholas Daultry Ball

185

25 Feb 2024

Neural Networks Asymptotic Behaviours for the Resolution of Inverse Problems

L. Debbio

Manuel Naviglio

Francesco Tarantelli

165

14 Feb 2024

Principled Weight Initialisation for Input-Convex Neural Networks

Pieter-Jan Hoedt

Günter Klambauer

218

19 Dec 2023

Commutative Width and Depth Scaling in Deep Neural Networks

Soufiane Hayou

222

02 Oct 2023

Fading memory as inductive bias in residual recurrent networksNeural Networks (Neural Netw.), 2023

I. Dubinin

Felix Effenberger

220

27 Jul 2023

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width LimitNeural Information Processing Systems (NeurIPS), 2023

318

30 Jun 2023

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

287

122

06 Apr 2023

Understanding plasticity in neural networksInternational Conference on Machine Learning (ICML), 2023

503

136

02 Mar 2023

Byzantine-Robust Learning on Heterogeneous Data via Gradient SplittingInternational Conference on Machine Learning (ICML), 2023

262

13 Feb 2023

Width and Depth Limits Commute in Residual NetworksInternational Conference on Machine Learning (ICML), 2023

Soufiane Hayou

Greg Yang

239

01 Feb 2023

On the Initialisation of Wide Low-Rank Feedforward Neural Networks

Thiziri Nait Saada

Jared Tanner

172

31 Jan 2023

Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features

Adityanarayanan Radhakrishnan

253

28 Dec 2022

Statistical Physics of Deep Neural Networks: Initialization toward Optimal ChannelsPhysical Review Research (Phys. Rev. Res.), 2022

276

04 Dec 2022

Analysis of Convolutions, Non-linearity and Depth in Graph Neural Networks using Neural Tangent Kernel

Mahalakshmi Sabanayagam

Pascal Esser

Debarghya Ghoshdastidar

379

18 Oct 2022

Component-Wise Natural Gradient Descent -- An Efficient Neural Network OptimizationInternational Symposium on Computing and Networking - Across Practical Development and Theoretical Research (ISAPDTR), 2022

198

11 Oct 2022

On Scrambling Phenomena for Randomly Initialized Recurrent NetworksNeural Information Processing Systems (NeurIPS), 2022

195

11 Oct 2022

On skip connections and normalisation layers in deep optimisationNeural Information Processing Systems (NeurIPS), 2022

L. MacDonald

Jack Valmadre

Hemanth Saratchandran

Simon Lucey

ODL

401

10 Oct 2022

Dynamical Isometry for Residual Networks

Advait Gadhikar

R. Burkholz

ODL AI4CE

213

05 Oct 2022

Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

Andrea Ceni

ODL

156

03 Oct 2022

Omnigrok: Grokking Beyond Algorithmic DataInternational Conference on Learning Representations (ICLR), 2022

Ziming Liu

Eric J. Michaud

Max Tegmark

360

111

03 Oct 2022

On the infinite-depth limit of finite-width neural networks

Soufiane Hayou

246

03 Oct 2022

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive BiasesNeural Information Processing Systems (NeurIPS), 2022

James Harrison

Luke Metz

Jascha Narain Sohl-Dickstein

235

22 Sep 2022

PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems

135

18 Sep 2022

Scaling ResNets in the Large-depth Regime

384

14 Jun 2022

The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at InitializationNeural Information Processing Systems (NeurIPS), 2022

Mufan Li

Mihai Nica

Daniel M. Roy

384

06 Jun 2022

Entangled Residual Mappings

Ramin Hasani

Daniela Rus

226

02 Jun 2022

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?Neural Information Processing Systems (NeurIPS), 2022

Michael E. Sander

Pierre Ablin

Gabriel Peyré

266

29 May 2022

Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural NetworksNeural Information Processing Systems (NeurIPS), 2022

Blake Bordelon

Cengiz Pehlevan

MLT

351

108

19 May 2022

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual NetworksSocial Science Research Network (SSRN), 2022

377

14 Apr 2022

Deep Learning without Shortcuts: Shaping the Kernel with Tailored RectifiersInternational Conference on Learning Representations (ICLR), 2022

230

15 Mar 2022

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Xiaodong Liu

359

223

07 Mar 2022

Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and InitializationInternational Conference on Machine Learning (ICML), 2022

Mariia Seleznova

Gitta Kutyniok

408

01 Feb 2022

Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

Darshil Doshi

Tianyu He

Andrey Gromov

341

23 Nov 2021

Gradients are Not All You Need

234

102

10 Nov 2021

A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs

180

03 Nov 2021

Free Probability for predicting the performance of feed-forward fully connected neural networksNeural Information Processing Systems (NeurIPS), 2021

294

01 Nov 2021

Feature Learning and Signal Propagation in Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2021

220

22 Oct 2021

The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at InitializationNeural Information Processing Systems (NeurIPS), 2021

Mufan Li

Mihai Nica

Daniel M. Roy

316

07 Jun 2021

Regularization in ResNet with Stochastic DepthNeural Information Processing Systems (NeurIPS), 2021

Soufiane Hayou

Fadhel Ayed

137

06 Jun 2021