ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.04326
  4. Cited By
A Comparative Analysis of the Optimization and Generalization Property
  of Two-layer Neural Network and Random Feature Models Under Gradient Descent
  Dynamics
v1v2 (latest)

A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics

8 April 2019
E. Weinan
Chao Ma
Lei Wu
    MLT
ArXiv (abs)PDFHTML

Papers citing "A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics"

50 / 91 papers shown
Scalable Complexity Control Facilitates Reasoning Ability of LLMs
Scalable Complexity Control Facilitates Reasoning Ability of LLMs
Liangkai Hang
Junjie Yao
Zhiwei Bai
Jiahao Huo
Yang Chen
...
Feiyu Xiong
Y. Zhang
Weinan E
Hongkang Yang
Zhi-hai Xu
LRM
205
2
0
29 May 2025
Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables
Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables
Yaru Liu
Yiqi Gu
Michael K. Ng
ODL
217
1
0
30 Apr 2025
Orthogonal greedy algorithm for linear operator learning with shallow neural networkJournal of Computational Physics (JCP), 2025
Ye Lin
Jiwei Jia
Young Ju Lee
Ran Zhang
290
2
0
06 Jan 2025
Nesterov acceleration in benignly non-convex landscapes
Nesterov acceleration in benignly non-convex landscapesInternational Conference on Learning Representations (ICLR), 2024
Kanan Gupta
Stephan Wojtowytsch
295
4
0
10 Oct 2024
Super Level Sets and Exponential Decay: A Synergistic Approach to Stable
  Neural Network Training
Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network TrainingJournal of Artificial Intelligence Research (JAIR), 2024
J. Chaudhary
Dipak Nidhi
J. Heikkonen
H. Merisaari
R. Kanth
153
0
0
25 Sep 2024
Deep Learning without Global Optimization by Random Fourier Neural Networks
Deep Learning without Global Optimization by Random Fourier Neural Networks
Owen Davis
Gianluca Geraci
Mohammad Motamed
BDL
326
1
0
16 Jul 2024
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or MemorizingNeural Information Processing Systems (NeurIPS), 2024
Zhongwang Zhang
Pengxiao Lin
Zhiwei Wang
Yaoyu Zhang
Z. Xu
600
3
0
08 May 2024
Comparing Spectral Bias and Robustness For Two-Layer Neural Networks:
  SGD vs Adaptive Random Fourier Features
Comparing Spectral Bias and Robustness For Two-Layer Neural Networks: SGD vs Adaptive Random Fourier Features
Aku Kammonen
Lisi Liang
Anamika Pandey
Raúl Tempone
274
3
0
01 Feb 2024
Minimum norm interpolation by perceptra: Explicit regularization and
  implicit bias
Minimum norm interpolation by perceptra: Explicit regularization and implicit biasNeural Information Processing Systems (NeurIPS), 2023
Jiyoung Park
Ian Pelakh
Stephan Wojtowytsch
214
1
0
10 Nov 2023
A qualitative difference between gradient flows of convex functions in
  finite- and infinite-dimensional Hilbert spaces
A qualitative difference between gradient flows of convex functions in finite- and infinite-dimensional Hilbert spaces
Jonathan W. Siegel
Stephan Wojtowytsch
204
5
0
26 Oct 2023
How many Neurons do we need? A refined Analysis for Shallow Networks
  trained with Gradient Descent
How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient DescentJournal of Statistical Planning and Inference (JSPI), 2023
Mike Nguyen
Nicole Mücke
MLT
267
6
0
14 Sep 2023
What can a Single Attention Layer Learn? A Study Through the Random
  Features Lens
What can a Single Attention Layer Learn? A Study Through the Random Features LensNeural Information Processing Systems (NeurIPS), 2023
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
201
35
0
21 Jul 2023
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function SpaceJournal of machine learning research (JMLR), 2023
Zhengdao Chen
351
3
0
03 Jul 2023
Gibbs-Based Information Criteria and the Over-Parameterized Regime
Gibbs-Based Information Criteria and the Over-Parameterized RegimeInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Haobo Chen
Yuheng Bu
Greg Wornell
324
1
0
08 Jun 2023
The $L^\infty$ Learnability of Reproducing Kernel Hilbert Spaces
The L∞L^\inftyL∞ Learnability of Reproducing Kernel Hilbert Spaces
Hongrui Chen
Jihao Long
Lei Wu
155
0
0
05 Jun 2023
Benign Overfitting in Deep Neural Networks under Lazy Training
Benign Overfitting in Deep Neural Networks under Lazy TrainingInternational Conference on Machine Learning (ICML), 2023
Zhenyu Zhu
Fanghui Liu
Grigorios G. Chrysos
Francesco Locatello
Volkan Cevher
AI4CE
200
12
0
30 May 2023
Understanding the Initial Condensation of Convolutional Neural Networks
Understanding the Initial Condensation of Convolutional Neural NetworksCSIAM Transactions on Applied Mathematics (TCAM), 2023
Zhangchen Zhou
Hanxu Zhou
Yuqing Li
Zhi-Qin John Xu
MLTAI4CE
174
6
0
17 May 2023
Reinforcement Learning with Function Approximation: From Linear to
  Nonlinear
Reinforcement Learning with Function Approximation: From Linear to NonlinearJournal of Machine Learning (JML), 2023
Jihao Long
Jiequn Han
270
8
0
20 Feb 2023
SPADE4: Sparsity and Delay Embedding based Forecasting of Epidemics
SPADE4: Sparsity and Delay Embedding based Forecasting of EpidemicsBulletin of Mathematical Biology (Bull. Math. Biol.), 2022
Esha Saha
L. Ho
Giang Tran
198
7
0
11 Nov 2022
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
  Neural Networks
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks
Zhengdao Chen
Eric Vanden-Eijnden
Joan Bruna
MLT
321
5
0
28 Oct 2022
Gradient descent provably escapes saddle points in the training of
  shallow ReLU networks
Gradient descent provably escapes saddle points in the training of shallow ReLU networksJournal of Optimization Theory and Applications (JOTA), 2022
Patrick Cheridito
Arnulf Jentzen
Florian Rossmannek
238
8
0
03 Aug 2022
Spectral Bias Outside the Training Set for Deep Networks in the Kernel
  Regime
Spectral Bias Outside the Training Set for Deep Networks in the Kernel RegimeNeural Information Processing Systems (NeurIPS), 2022
Benjamin Bowman
Guido Montúfar
268
17
0
06 Jun 2022
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student
  Settings and its Superiority to Kernel Methods
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel MethodsInternational Conference on Learning Representations (ICLR), 2022
Shunta Akiyama
Taiji Suzuki
199
9
0
30 May 2022
Empirical Phase Diagram for Three-layer Neural Networks with Infinite
  Width
Empirical Phase Diagram for Three-layer Neural Networks with Infinite WidthNeural Information Processing Systems (NeurIPS), 2022
Hanxu Zhou
Qixuan Zhou
Zhenyuan Jin
Yaoyu Zhang
Yaoyu Zhang
Zhi-Qin John Xu
236
22
0
24 May 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural
  Network Loss Landscapes
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss LandscapesJournal of Machine Learning (JML), 2022
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
207
35
0
24 Apr 2022
Convergence of gradient descent for deep neural networks
Convergence of gradient descent for deep neural networks
S. Chatterjee
ODL
289
28
0
30 Mar 2022
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Noise Regularizes Over-parameterized Rank One Matrix Recovery, ProvablyInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Tianyi Liu
Yan Li
Enlu Zhou
Tuo Zhao
127
1
0
07 Feb 2022
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural
  Networks
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural NetworksInternational Conference on Learning Representations (ICLR), 2022
Benjamin Bowman
Guido Montúfar
196
13
0
12 Jan 2022
Convergence proof for stochastic gradient descent in the training of
  deep neural networks with ReLU activation for constant target functions
Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions
Martin Hutzenthaler
Arnulf Jentzen
Katharina Pohl
Adrian Riekert
Luca Scarpa
MLT
300
10
0
13 Dec 2021
Existence, uniqueness, and convergence rates for gradient flows in the
  training of artificial neural networks with ReLU activation
Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation
Simon Eberle
Arnulf Jentzen
Adrian Riekert
G. Weiss
159
13
0
18 Aug 2021
A proof of convergence for the gradient descent optimization method with
  random initializations in the training of neural networks with ReLU
  activation for piecewise linear target functions
A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functionsJournal of machine learning research (JMLR), 2021
Arnulf Jentzen
Adrian Riekert
215
19
0
10 Aug 2021
Convergence analysis for gradient flows in the training of artificial
  neural networks with ReLU activation
Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activationJournal of Mathematical Analysis and Applications (JMAA), 2021
Arnulf Jentzen
Adrian Riekert
192
26
0
09 Jul 2021
On Learnability via Gradient Method for Two-Layer ReLU Neural Networks
  in Teacher-Student Setting
On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student SettingInternational Conference on Machine Learning (ICML), 2021
Shunta Akiyama
Taiji Suzuki
MLT
231
16
0
11 Jun 2021
Nonasymptotic theory for two-layer neural networks: Beyond the
  bias-variance trade-off
Nonasymptotic theory for two-layer neural networks: Beyond the bias-variance trade-off
Huiyuan Wang
Wei Lin
MLT
165
5
0
09 Jun 2021
Embedding Principle of Loss Landscape of Deep Neural Networks
Embedding Principle of Loss Landscape of Deep Neural NetworksNeural Information Processing Systems (NeurIPS), 2021
Yaoyu Zhang
Zhongwang Zhang
Yaoyu Zhang
Z. Xu
241
42
0
30 May 2021
Towards Understanding the Condensation of Neural Networks at Initial
  Training
Towards Understanding the Condensation of Neural Networks at Initial TrainingNeural Information Processing Systems (NeurIPS), 2021
Hanxu Zhou
Qixuan Zhou
Yaoyu Zhang
Yaoyu Zhang
Z. Xu
MLTAI4CE
364
32
0
25 May 2021
Generalization Guarantees for Neural Architecture Search with
  Train-Validation Split
Generalization Guarantees for Neural Architecture Search with Train-Validation SplitInternational Conference on Machine Learning (ICML), 2021
Samet Oymak
Mingchen Li
Mahdi Soltanolkotabi
AI4CEOOD
254
19
0
29 Apr 2021
An $L^2$ Analysis of Reinforcement Learning in High Dimensions with
  Kernel and Neural Network Approximation
An L2L^2L2 Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network ApproximationCSIAM Transactions on Applied Mathematics (CSIAM Trans. Appl. Math.), 2021
Jihao Long
Jiequn Han null
Weinan E
OffRL
171
15
0
15 Apr 2021
A proof of convergence for stochastic gradient descent in the training
  of artificial neural networks with ReLU activation for constant target
  functions
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functionsZeitschrift für Angewandte Mathematik und Physik (ZAMP), 2021
Arnulf Jentzen
Adrian Riekert
MLT
211
16
0
01 Apr 2021
Convergence rates for gradient descent in the training of
  overparameterized artificial neural networks with biases
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
Arnulf Jentzen
T. Kröger
ODL
183
8
0
23 Feb 2021
A proof of convergence for gradient descent in the training of
  artificial neural networks for constant target functions
A proof of convergence for gradient descent in the training of artificial neural networks for constant target functionsJournal of Complexity (JC), 2021
Patrick Cheridito
Arnulf Jentzen
Adrian Riekert
Florian Rossmannek
134
27
0
19 Feb 2021
Linear Frequency Principle Model to Understand the Absence of
  Overfitting in Neural Networks
Linear Frequency Principle Model to Understand the Absence of Overfitting in Neural NetworksChinese Physics Letters (CPL), 2021
Yaoyu Zhang
Yaoyu Zhang
Zheng Ma
Zhi-Qin John Xu
183
23
0
30 Jan 2021
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse
  in Imbalanced Training
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced TrainingProceedings of the National Academy of Sciences of the United States of America (PNAS), 2021
Cong Fang
Hangfeng He
Qi Long
Weijie J. Su
FAtt
468
206
0
29 Jan 2021
Implicit Bias of Linear RNNs
Implicit Bias of Linear RNNsInternational Conference on Machine Learning (ICML), 2021
M Motavali Emami
Mojtaba Sahraee-Ardakan
Parthe Pandit
S. Rangan
A. Fletcher
171
13
0
19 Jan 2021
Strong overall error analysis for the training of artificial neural
  networks via random initializations
Strong overall error analysis for the training of artificial neural networks via random initializationsCommunications in Mathematics and Statistics (Commun. Math. Stat.), 2020
Arnulf Jentzen
Adrian Riekert
204
3
0
15 Dec 2020
On the emergence of simplex symmetry in the final and penultimate layers
  of neural network classifiers
On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiersMathematical and Scientific Machine Learning (MSML), 2020
E. Weinan
Stephan Wojtowytsch
276
48
0
10 Dec 2020
Benefit of deep learning with non-convex noisy gradient descent:
  Provable excess risk bound and superiority to kernel methods
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methodsInternational Conference on Learning Representations (ICLR), 2020
Taiji Suzuki
Shunta Akiyama
MLT
221
12
0
06 Dec 2020
On the exact computation of linear frequency principle dynamics and its
  generalization
On the exact computation of linear frequency principle dynamics and its generalization
Yaoyu Zhang
Zheng Ma
Z. Xu
Yaoyu Zhang
181
23
0
15 Oct 2020
Towards a Mathematical Understanding of Neural Network-Based Machine
  Learning: what we know and what we don't
Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don'tCSIAM Transactions on Applied Mathematics (CSIAM Trans. Appl. Math.), 2020
E. Weinan
Chao Ma
Stephan Wojtowytsch
Lei Wu
AI4CE
321
146
0
22 Sep 2020
The Slow Deterioration of the Generalization Error of the Random Feature
  Model
The Slow Deterioration of the Generalization Error of the Random Feature ModelMathematical and Scientific Machine Learning (MSML), 2020
Chao Ma
Lei Wu
E. Weinan
131
16
0
13 Aug 2020
12
Next