ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.08969
  4. Cited By
Mean Field Residual Networks: On the Edge of Chaos

Mean Field Residual Networks: On the Edge of Chaos

Neural Information Processing Systems (NeurIPS), 2017
24 December 2017
Greg Yang
S. Schoenholz
ArXiv (abs)PDFHTML

Papers citing "Mean Field Residual Networks: On the Edge of Chaos"

50 / 130 papers shown
Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning
Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning
Ankur Mali
Lawrence Hall
Jake Williams
Gordon Richards
103
0
0
09 Oct 2025
Arithmetic-Mean $μ$P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets
Arithmetic-Mean μμμP for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets
Haosong Zhang
Shenxi Wu
Yichi Zhang
Wei Lin
W. Lin
123
0
0
05 Oct 2025
Toward a Physics of Deep Learning and Brains
Toward a Physics of Deep Learning and Brains
Arsham Ghavasieh
Meritxell Vila-Minana
Akanksha Khurd
John Beggs
Gerardo Ortiz
Santo Fortunato
64
1
0
26 Sep 2025
ResNets Are Deeper Than You Think
ResNets Are Deeper Than You Think
Christian H.X. Ali Mehmeti-Göpel
Michael Wand
184
1
0
17 Jun 2025
Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
Yihe Dong
Lorenzo Noci
Mikhail Khodak
Mufan Li
441
1
0
01 Jun 2025
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
Alessio Giorlandino
Sebastian Goldt
283
4
0
30 May 2025
GradAlign for Training-free Model Performance Inference
GradAlign for Training-free Model Performance Inference
Yuxuan Li
Yunhui Guo
264
0
0
29 Nov 2024
Generalized Probabilistic Attention Mechanism in Transformers
Generalized Probabilistic Attention Mechanism in Transformers
DongNyeong Heo
Heeyoul Choi
277
3
0
21 Oct 2024
Collective variables of neural networks: empirical time evolution and
  scaling laws
Collective variables of neural networks: empirical time evolution and scaling laws
S. Tovey
Sven Krippendorf
M. Spannowsky
Konstantin Nikolaou
Christian Holm
196
2
0
09 Oct 2024
UnitNorm: Rethinking Normalization for Transformers in Time Series
UnitNorm: Rethinking Normalization for Transformers in Time Series
Nan Huang
C. Kümmerle
Xiang Zhang
AI4TS
238
4
0
24 May 2024
Principled Architecture-aware Scaling of Hyperparameters
Principled Architecture-aware Scaling of Hyperparameters
Wuyang Chen
Junru Wu
Zhangyang Wang
Boris Hanin
AI4CE
298
2
0
27 Feb 2024
Deep Neural Network Initialization with Sparsity Inducing Activations
Deep Neural Network Initialization with Sparsity Inducing Activations
Ilan Price
Nicholas Daultry Ball
Samuel C.H. Lam
Adam C. Jones
Jared Tanner
AI4CE
185
2
0
25 Feb 2024
Neural Networks Asymptotic Behaviours for the Resolution of Inverse
  Problems
Neural Networks Asymptotic Behaviours for the Resolution of Inverse Problems
L. Debbio
Manuel Naviglio
Francesco Tarantelli
165
0
0
14 Feb 2024
Principled Weight Initialisation for Input-Convex Neural Networks
Principled Weight Initialisation for Input-Convex Neural Networks
Pieter-Jan Hoedt
Günter Klambauer
218
11
0
19 Dec 2023
Commutative Width and Depth Scaling in Deep Neural Networks
Commutative Width and Depth Scaling in Deep Neural Networks
Soufiane Hayou
222
2
0
02 Oct 2023
Fading memory as inductive bias in residual recurrent networks
Fading memory as inductive bias in residual recurrent networksNeural Networks (Neural Netw.), 2023
I. Dubinin
Felix Effenberger
220
9
0
27 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width
  Limit
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width LimitNeural Information Processing Systems (NeurIPS), 2023
Lorenzo Noci
Chuning Li
Mufan Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
318
44
0
30 Jun 2023
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the
  Cerebras Wafer-Scale Cluster
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoELRM
287
122
0
06 Apr 2023
Understanding plasticity in neural networks
Understanding plasticity in neural networksInternational Conference on Machine Learning (ICML), 2023
Clare Lyle
Zeyu Zheng
Evgenii Nikishin
Bernardo Avila-Pires
Razvan Pascanu
Will Dabney
AI4CE
503
136
0
02 Mar 2023
Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting
Byzantine-Robust Learning on Heterogeneous Data via Gradient SplittingInternational Conference on Machine Learning (ICML), 2023
Yuchen Liu
Chen Chen
Lingjuan Lyu
Fangzhao Wu
Sai Wu
Gang Chen
262
22
0
13 Feb 2023
Width and Depth Limits Commute in Residual Networks
Width and Depth Limits Commute in Residual NetworksInternational Conference on Machine Learning (ICML), 2023
Soufiane Hayou
Greg Yang
239
17
0
01 Feb 2023
On the Initialisation of Wide Low-Rank Feedforward Neural Networks
On the Initialisation of Wide Low-Rank Feedforward Neural Networks
Thiziri Nait Saada
Jared Tanner
172
2
0
31 Jan 2023
Mechanism of feature learning in deep fully connected networks and
  kernel machines that recursively learn features
Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features
Adityanarayanan Radhakrishnan
Daniel Beaglehole
Parthe Pandit
M. Belkin
FAttMLT
253
18
0
28 Dec 2022
Statistical Physics of Deep Neural Networks: Initialization toward
  Optimal Channels
Statistical Physics of Deep Neural Networks: Initialization toward Optimal ChannelsPhysical Review Research (Phys. Rev. Res.), 2022
Kangyu Weng
Aohua Cheng
Ziyang Zhang
Pei Sun
Yang Tian
276
5
0
04 Dec 2022
Analysis of Convolutions, Non-linearity and Depth in Graph Neural
  Networks using Neural Tangent Kernel
Analysis of Convolutions, Non-linearity and Depth in Graph Neural Networks using Neural Tangent Kernel
Mahalakshmi Sabanayagam
Pascal Esser
Debarghya Ghoshdastidar
379
4
0
18 Oct 2022
Component-Wise Natural Gradient Descent -- An Efficient Neural Network
  Optimization
Component-Wise Natural Gradient Descent -- An Efficient Neural Network OptimizationInternational Symposium on Computing and Networking - Across Practical Development and Theoretical Research (ISAPDTR), 2022
Tran van Sang
Mhd Irvan
R. Yamaguchi
Toshiyuki Nakata
198
1
0
11 Oct 2022
On Scrambling Phenomena for Randomly Initialized Recurrent Networks
On Scrambling Phenomena for Randomly Initialized Recurrent NetworksNeural Information Processing Systems (NeurIPS), 2022
Vaggos Chatziafratis
Ioannis Panageas
Clayton Sanford
S. Stavroulakis
195
2
0
11 Oct 2022
On skip connections and normalisation layers in deep optimisation
On skip connections and normalisation layers in deep optimisationNeural Information Processing Systems (NeurIPS), 2022
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
401
4
0
10 Oct 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODLAI4CE
213
2
0
05 Oct 2022
Random orthogonal additive filters: a solution to the
  vanishing/exploding gradient of deep neural networks
Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Andrea Ceni
ODL
156
11
0
03 Oct 2022
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic DataInternational Conference on Learning Representations (ICLR), 2022
Ziming Liu
Eric J. Michaud
Max Tegmark
360
111
0
03 Oct 2022
On the infinite-depth limit of finite-width neural networks
On the infinite-depth limit of finite-width neural networks
Soufiane Hayou
246
24
0
03 Oct 2022
A Closer Look at Learned Optimization: Stability, Robustness, and
  Inductive Biases
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive BiasesNeural Information Processing Systems (NeurIPS), 2022
James Harrison
Luke Metz
Jascha Narain Sohl-Dickstein
235
30
0
22 Sep 2022
PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM)
  Systems
PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems
Qing Jin
Zhiyu Chen
J. Ren
Yanyu Li
Yanzhi Wang
Kai-Min Yang
MQ
135
7
0
18 Sep 2022
Scaling ResNets in the Large-depth Regime
Scaling ResNets in the Large-depth Regime
Pierre Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
384
18
0
14 Jun 2022
The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at
  Initialization
The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at InitializationNeural Information Processing Systems (NeurIPS), 2022
Mufan Li
Mihai Nica
Daniel M. Roy
384
44
0
06 Jun 2022
Entangled Residual Mappings
Entangled Residual Mappings
Mathias Lechner
Ramin Hasani
Z. Babaiee
Radu Grosu
Daniela Rus
T. Henzinger
Sepp Hochreiter
226
5
0
02 Jun 2022
Do Residual Neural Networks discretize Neural Ordinary Differential
  Equations?
Do Residual Neural Networks discretize Neural Ordinary Differential Equations?Neural Information Processing Systems (NeurIPS), 2022
Michael E. Sander
Pierre Ablin
Gabriel Peyré
266
34
0
29 May 2022
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide
  Neural Networks
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural NetworksNeural Information Processing Systems (NeurIPS), 2022
Blake Bordelon
Cengiz Pehlevan
MLT
351
108
0
19 May 2022
Convergence and Implicit Regularization Properties of Gradient Descent
  for Deep Residual Networks
Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual NetworksSocial Science Research Network (SSRN), 2022
R. Cont
Alain Rossier
Renyuan Xu
MLT
377
6
0
14 Apr 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored
  Rectifiers
Deep Learning without Shortcuts: Shaping the Kernel with Tailored RectifiersInternational Conference on Learning Representations (ICLR), 2022
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
230
30
0
15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
359
223
0
07 Mar 2022
Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth
  and Initialization
Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and InitializationInternational Conference on Machine Learning (ICML), 2022
Mariia Seleznova
Gitta Kutyniok
408
28
0
01 Feb 2022
Critical Initialization of Wide and Deep Neural Networks through Partial
  Jacobians: General Theory and Applications
Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications
Darshil Doshi
Tianyu He
Andrey Gromov
341
10
0
23 Nov 2021
Gradients are Not All You Need
Gradients are Not All You Need
Luke Metz
C. Freeman
S. Schoenholz
Tal Kachman
234
102
0
10 Nov 2021
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
Ido Nachum
Jan Hkazla
Michael C. Gastpar
Anatoly Khina
180
0
0
03 Nov 2021
Free Probability for predicting the performance of feed-forward fully
  connected neural networks
Free Probability for predicting the performance of feed-forward fully connected neural networksNeural Information Processing Systems (NeurIPS), 2021
Reda Chhaibi
Tariq Daouda
E. Kahn
ODL
294
4
0
01 Nov 2021
Feature Learning and Signal Propagation in Deep Neural Networks
Feature Learning and Signal Propagation in Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2021
Yizhang Lou
Chris Mingard
Yoonsoo Nam
Soufiane Hayou
MDE
220
18
0
22 Oct 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
  Limit at Initialization
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at InitializationNeural Information Processing Systems (NeurIPS), 2021
Mufan Li
Mihai Nica
Daniel M. Roy
316
36
0
07 Jun 2021
Regularization in ResNet with Stochastic Depth
Regularization in ResNet with Stochastic DepthNeural Information Processing Systems (NeurIPS), 2021
Soufiane Hayou
Fadhel Ayed
137
15
0
06 Jun 2021
123
Next