ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1902.06720
  4. Cited By
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient
  Descent

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

18 February 2019
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
ArXivPDFHTML

Papers citing "Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent"

50 / 288 papers shown
Title
Fourier-domain Variational Formulation and Its Well-posedness for
  Supervised Learning
Fourier-domain Variational Formulation and Its Well-posedness for Supervised Learning
Tao Luo
Zheng Ma
Zhiwei Wang
Zhi-Qin John Xu
Yaoyu Zhang
OOD
47
4
0
06 Dec 2020
Gradient Starvation: A Learning Proclivity in Neural Networks
Gradient Starvation: A Learning Proclivity in Neural Networks
Mohammad Pezeshki
Sekouba Kaba
Yoshua Bengio
Aaron Courville
Doina Precup
Guillaume Lajoie
MLT
50
258
0
18 Nov 2020
On Function Approximation in Reinforcement Learning: Optimism in the
  Face of Large State Spaces
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Zhuoran Yang
Chi Jin
Zhaoran Wang
Mengdi Wang
Michael I. Jordan
39
18
0
09 Nov 2020
Dataset Meta-Learning from Kernel Ridge-Regression
Dataset Meta-Learning from Kernel Ridge-Regression
Timothy Nguyen
Zhourung Chen
Jaehoon Lee
DD
36
240
0
30 Oct 2020
Scaling Laws for Autoregressive Generative Modeling
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
53
408
0
28 Oct 2020
Are wider nets better given the same number of parameters?
Are wider nets better given the same number of parameters?
A. Golubeva
Behnam Neyshabur
Guy Gur-Ari
27
44
0
27 Oct 2020
Memorizing without overfitting: Bias, variance, and interpolation in
  over-parameterized models
Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models
J. Rocks
Pankaj Mehta
23
41
0
26 Oct 2020
A Dynamical View on Optimization Algorithms of Overparameterized Neural
  Networks
A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks
Zhiqi Bu
Shiyun Xu
Kan Chen
33
17
0
25 Oct 2020
Stable ResNet
Stable ResNet
Soufiane Hayou
Eugenio Clerico
Bo He
George Deligiannidis
Arnaud Doucet
Judith Rousseau
ODL
SSeg
46
51
0
24 Oct 2020
Global optimality of softmax policy gradient with single hidden layer
  neural networks in the mean-field regime
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Andrea Agazzi
Jianfeng Lu
13
15
0
22 Oct 2020
Knowledge Distillation in Wide Neural Networks: Risk Bound, Data
  Efficiency and Imperfect Teacher
Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher
Guangda Ji
Zhanxing Zhu
59
42
0
20 Oct 2020
A Theoretical Analysis of Catastrophic Forgetting through the NTK
  Overlap Matrix
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix
T. Doan
Mehdi Abbana Bennani
Bogdan Mazoure
Guillaume Rabusseau
Pierre Alquier
CLL
20
80
0
07 Oct 2020
On the linearity of large non-linear models: when and why the tangent
  kernel is constant
On the linearity of large non-linear models: when and why the tangent kernel is constant
Chaoyue Liu
Libin Zhu
M. Belkin
21
140
0
02 Oct 2020
Understanding Approximate Fisher Information for Fast Convergence of
  Natural Gradient Descent in Wide Neural Networks
Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Ryo Karakida
Kazuki Osawa
22
25
0
02 Oct 2020
How Neural Networks Extrapolate: From Feedforward to Graph Neural
  Networks
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks
Keyulu Xu
Mozhi Zhang
Jingling Li
S. Du
Ken-ichi Kawarabayashi
Stefanie Jegelka
MLT
25
306
0
24 Sep 2020
Tensor Programs III: Neural Matrix Laws
Tensor Programs III: Neural Matrix Laws
Greg Yang
14
44
0
22 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
36
79
0
17 Sep 2020
Predicting Training Time Without Training
Predicting Training Time Without Training
L. Zancato
Alessandro Achille
Avinash Ravichandran
Rahul Bhotika
Stefano Soatto
26
24
0
28 Aug 2020
Deep Networks and the Multiple Manifold Problem
Deep Networks and the Multiple Manifold Problem
Sam Buchanan
D. Gilboa
John N. Wright
166
39
0
25 Aug 2020
Whitening and second order optimization both make information in the
  dataset unusable during training, and can reduce or prevent generalization
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization
Neha S. Wadia
Daniel Duckworth
S. Schoenholz
Ethan Dyer
Jascha Narain Sohl-Dickstein
27
13
0
17 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Zuyue Fu
Zhuoran Yang
Zhaoran Wang
18
42
0
02 Aug 2020
When and why PINNs fail to train: A neural tangent kernel perspective
When and why PINNs fail to train: A neural tangent kernel perspective
Sizhuang He
Xinling Yu
P. Perdikaris
33
880
0
28 Jul 2020
The Interpolation Phase Transition in Neural Networks: Memorization and
  Generalization under Lazy Training
The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training
Andrea Montanari
Yiqiao Zhong
49
95
0
25 Jul 2020
Phase diagram for two-layer ReLU neural networks at infinite-width limit
Phase diagram for two-layer ReLU neural networks at infinite-width limit
Tao Luo
Zhi-Qin John Xu
Zheng Ma
Yaoyu Zhang
19
59
0
15 Jul 2020
Provably Efficient Neural Estimation of Structural Equation Model: An
  Adversarial Approach
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach
Luofeng Liao
You-Lin Chen
Zhuoran Yang
Bo Dai
Zhaoran Wang
Mladen Kolar
30
33
0
02 Jul 2020
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural
  Network Initialization?
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?
Yaniv Blumenfeld
D. Gilboa
Daniel Soudry
ODL
30
13
0
02 Jul 2020
Associative Memory in Iterated Overparameterized Sigmoid Autoencoders
Associative Memory in Iterated Overparameterized Sigmoid Autoencoders
Yibo Jiang
Cengiz Pehlevan
19
13
0
30 Jun 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Greg Yang
58
135
0
25 Jun 2020
Spectral Bias and Task-Model Alignment Explain Generalization in Kernel
  Regression and Infinitely Wide Neural Networks
Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks
Abdulkadir Canatar
Blake Bordelon
Cengiz Pehlevan
22
181
0
23 Jun 2020
Generalisation Guarantees for Continual Learning with Orthogonal
  Gradient Descent
Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent
Mehdi Abbana Bennani
Thang Doan
Masashi Sugiyama
CLL
50
61
0
21 Jun 2020
An analytic theory of shallow networks dynamics for hinge loss
  classification
An analytic theory of shallow networks dynamics for hinge loss classification
Franco Pellegrini
Giulio Biroli
35
19
0
19 Jun 2020
Fourier Features Let Networks Learn High Frequency Functions in Low
  Dimensional Domains
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
Matthew Tancik
Pratul P. Srinivasan
B. Mildenhall
Sara Fridovich-Keil
N. Raghavan
Utkarsh Singhal
R. Ramamoorthi
Jonathan T. Barron
Ren Ng
60
2,349
0
18 Jun 2020
Directional Pruning of Deep Neural Networks
Directional Pruning of Deep Neural Networks
Shih-Kang Chao
Zhanyu Wang
Yue Xing
Guang Cheng
ODL
21
33
0
16 Jun 2020
Can Temporal-Difference and Q-Learning Learn Representation? A
  Mean-Field Theory
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
Yufeng Zhang
Qi Cai
Zhuoran Yang
Yongxin Chen
Zhaoran Wang
OOD
MLT
138
11
0
08 Jun 2020
Spectra of the Conjugate Kernel and Neural Tangent Kernel for
  linear-width neural networks
Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks
Z. Fan
Zhichao Wang
44
71
0
25 May 2020
Consistency of Empirical Bayes And Kernel Flow For Hierarchical
  Parameter Estimation
Consistency of Empirical Bayes And Kernel Flow For Hierarchical Parameter Estimation
Yifan Chen
H. Owhadi
Andrew M. Stuart
25
31
0
22 May 2020
Global inducing point variational posteriors for Bayesian neural
  networks and deep Gaussian processes
Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes
Sebastian W. Ober
Laurence Aitchison
BDL
26
60
0
17 May 2020
Learning the gravitational force law and other analytic functions
Learning the gravitational force law and other analytic functions
Atish Agarwala
Abhimanyu Das
Rina Panigrahy
Qiuyi Zhang
MLT
16
0
0
15 May 2020
Random Features for Kernel Approximation: A Survey on Algorithms,
  Theory, and Beyond
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond
Fanghui Liu
Xiaolin Huang
Yudong Chen
Johan A. K. Suykens
BDL
44
172
0
23 Apr 2020
Frequency Bias in Neural Networks for Input of Non-Uniform Density
Frequency Bias in Neural Networks for Input of Non-Uniform Density
Ronen Basri
Meirav Galun
Amnon Geifman
David Jacobs
Yoni Kasten
S. Kritchman
42
183
0
10 Mar 2020
Forgetting Outside the Box: Scrubbing Deep Networks of Information
  Accessible from Input-Output Observations
Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations
Aditya Golatkar
Alessandro Achille
Stefano Soatto
MU
OOD
22
189
0
05 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
235
0
04 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear
  systems and neural networks
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
17
248
0
29 Feb 2020
Deep regularization and direct training of the inner layers of Neural
  Networks with Kernel Flows
Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows
G. Yoo
H. Owhadi
24
21
0
19 Feb 2020
Learning Parities with Neural Networks
Learning Parities with Neural Networks
Amit Daniely
Eran Malach
24
76
0
18 Feb 2020
Self-Distillation Amplifies Regularization in Hilbert Space
Self-Distillation Amplifies Regularization in Hilbert Space
H. Mobahi
Mehrdad Farajtabar
Peter L. Bartlett
33
227
0
13 Feb 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
21
949
0
12 Feb 2020
Machine Unlearning: Linear Filtration for Logit-based Classifiers
Machine Unlearning: Linear Filtration for Logit-based Classifiers
Thomas Baumhauer
Pascal Schöttle
Matthias Zeppelzauer
MU
114
130
0
07 Feb 2020
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural
  Networks
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Blake Bordelon
Abdulkadir Canatar
Cengiz Pehlevan
146
201
0
07 Feb 2020
On the infinite width limit of neural networks with a standard
  parameterization
On the infinite width limit of neural networks with a standard parameterization
Jascha Narain Sohl-Dickstein
Roman Novak
S. Schoenholz
Jaehoon Lee
32
47
0
21 Jan 2020
Previous
123456
Next