ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.08888
  4. Cited By
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

21 November 2018
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
    ODL
ArXivPDFHTML

Papers citing "Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks"

50 / 91 papers shown
Title
Statistically guided deep learning
Statistically guided deep learning
Michael Kohler
A. Krzyżak
ODL
BDL
68
0
0
11 Apr 2025
Extended convexity and smoothness and their applications in deep learning
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
61
0
0
08 Oct 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
A. Banerjee
Qiaobo Li
Yingxue Zhou
44
0
0
11 Jun 2024
NTK-Guided Few-Shot Class Incremental Learning
NTK-Guided Few-Shot Class Incremental Learning
Jingren Liu
Zhong Ji
Yanwei Pang
Yunlong Yu
CLL
34
3
0
19 Mar 2024
\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike
  \emph{sign} perceptrons neural networks
\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks
M. Stojnic
22
1
0
13 Dec 2023
Capacity of the treelike sign perceptrons neural networks with one
  hidden layer -- RDT based upper bounds
Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds
M. Stojnic
16
4
0
13 Dec 2023
Differentially Private Non-convex Learning for Multi-layer Neural
  Networks
Differentially Private Non-convex Learning for Multi-layer Neural Networks
Hanpu Shen
Cheng-Long Wang
Zihang Xiang
Yiming Ying
Di Wang
33
7
0
12 Oct 2023
Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss
Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss
T. Getu
Georges Kaddoum
M. Bennis
32
1
0
13 Sep 2023
Learning Prescriptive ReLU Networks
Learning Prescriptive ReLU Networks
Wei-Ju Sun
Asterios Tsiourvas
19
2
0
01 Jun 2023
An Analysis of Attention via the Lens of Exchangeability and Latent
  Variable Models
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
45
11
0
30 Dec 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion
Characterizing the Spectrum of the NTK via a Power Series Expansion
Michael Murray
Hui Jin
Benjamin Bowman
Guido Montúfar
30
11
0
15 Nov 2022
Multilayer Perceptron Network Discriminates Larval Zebrafish Genotype
  using Behaviour
Multilayer Perceptron Network Discriminates Larval Zebrafish Genotype using Behaviour
Christopher Fusco
Angel G Allen
9
0
0
06 Nov 2022
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
When Expressivity Meets Trainability: Fewer than nnn Neurons Can Work
Jiawei Zhang
Yushun Zhang
Mingyi Hong
Ruoyu Sun
Z. Luo
26
10
0
21 Oct 2022
Global Convergence of SGD On Two Layer Neural Nets
Global Convergence of SGD On Two Layer Neural Nets
Pulkit Gopalani
Anirbit Mukherjee
20
5
0
20 Oct 2022
Neural Networks can Learn Representations with Gradient Descent
Neural Networks can Learn Representations with Gradient Descent
Alexandru Damian
Jason D. Lee
Mahdi Soltanolkotabi
SSL
MLT
17
112
0
30 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
37
69
0
14 Jun 2022
On the Convergence to a Global Solution of Shuffling-Type Gradient
  Algorithms
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms
Lam M. Nguyen
Trang H. Tran
32
2
0
13 Jun 2022
Identifying good directions to escape the NTK regime and efficiently
  learn low-degree plus sparse polynomials
Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials
Eshaan Nichani
Yunzhi Bai
Jason D. Lee
24
10
0
08 Jun 2022
Non-convex online learning via algorithmic equivalence
Non-convex online learning via algorithmic equivalence
Udaya Ghai
Zhou Lu
Elad Hazan
14
8
0
30 May 2022
Global Convergence of Over-parameterized Deep Equilibrium Models
Global Convergence of Over-parameterized Deep Equilibrium Models
Zenan Ling
Xingyu Xie
Qiuhao Wang
Zongpeng Zhang
Zhouchen Lin
27
12
0
27 May 2022
Empirical Phase Diagram for Three-layer Neural Networks with Infinite
  Width
Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
Hanxu Zhou
Qixuan Zhou
Zhenyuan Jin
Tao Luo
Yaoyu Zhang
Zhi-Qin John Xu
22
20
0
24 May 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural
  Network Loss Landscapes
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
25
27
0
24 Apr 2022
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural
  Networks
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks
Benjamin Bowman
Guido Montúfar
18
11
0
12 Jan 2022
On the Convergence and Robustness of Adversarial Training
On the Convergence and Robustness of Adversarial Training
Yisen Wang
Xingjun Ma
James Bailey
Jinfeng Yi
Bowen Zhou
Quanquan Gu
AAML
194
345
0
15 Dec 2021
Convergence proof for stochastic gradient descent in the training of
  deep neural networks with ReLU activation for constant target functions
Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions
Martin Hutzenthaler
Arnulf Jentzen
Katharina Pohl
Adrian Riekert
Luca Scarpa
MLT
34
6
0
13 Dec 2021
On the Convergence of Shallow Neural Network Training with Randomly
  Masked Neurons
On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons
Fangshuo Liao
Anastasios Kyrillidis
36
16
0
05 Dec 2021
Embedding Principle: a hierarchical structure of loss landscape of deep
  neural networks
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Yaoyu Zhang
Yuqing Li
Zhongwang Zhang
Tao Luo
Z. Xu
21
21
0
30 Nov 2021
Learning with convolution and pooling operations in kernel methods
Learning with convolution and pooling operations in kernel methods
Theodor Misiakiewicz
Song Mei
MLT
15
29
0
16 Nov 2021
Subquadratic Overparameterization for Shallow Neural Networks
Subquadratic Overparameterization for Shallow Neural Networks
Chaehwan Song
Ali Ramezani-Kebrya
Thomas Pethick
Armin Eftekhari
V. Cevher
24
32
0
02 Nov 2021
Provable Regret Bounds for Deep Online Learning and Control
Provable Regret Bounds for Deep Online Learning and Control
Xinyi Chen
Edgar Minasyan
Jason D. Lee
Elad Hazan
23
6
0
15 Oct 2021
A global convergence theory for deep ReLU implicit networks via
  over-parameterization
A global convergence theory for deep ReLU implicit networks via over-parameterization
Tianxiang Gao
Hailiang Liu
Jia Liu
Hridesh Rajan
Hongyang Gao
MLT
23
16
0
11 Oct 2021
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks
  Trained by Gradient Descent
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent
Spencer Frei
Quanquan Gu
17
25
0
25 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
  Limit at Initialization
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
Mufan Bill Li
Mihai Nica
Daniel M. Roy
23
33
0
07 Jun 2021
Global Convergence of Three-layer Neural Networks in the Mean Field
  Regime
Global Convergence of Three-layer Neural Networks in the Mean Field Regime
H. Pham
Phan-Minh Nguyen
MLT
AI4CE
41
19
0
11 May 2021
Generalization Guarantees for Neural Architecture Search with
  Train-Validation Split
Generalization Guarantees for Neural Architecture Search with Train-Validation Split
Samet Oymak
Mingchen Li
Mahdi Soltanolkotabi
AI4CE
OOD
31
13
0
29 Apr 2021
A proof of convergence for stochastic gradient descent in the training
  of artificial neural networks with ReLU activation for constant target
  functions
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Arnulf Jentzen
Adrian Riekert
MLT
32
13
0
01 Apr 2021
Experiments with Rich Regime Training for Deep Learning
Experiments with Rich Regime Training for Deep Learning
Xinyan Li
A. Banerjee
26
2
0
26 Feb 2021
Learning with invariances in random features and kernel models
Learning with invariances in random features and kernel models
Song Mei
Theodor Misiakiewicz
Andrea Montanari
OOD
46
89
0
25 Feb 2021
Convergence rates for gradient descent in the training of
  overparameterized artificial neural networks with biases
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
Arnulf Jentzen
T. Kröger
ODL
28
7
0
23 Feb 2021
A proof of convergence for gradient descent in the training of
  artificial neural networks for constant target functions
A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
Patrick Cheridito
Arnulf Jentzen
Adrian Riekert
Florian Rossmannek
23
24
0
19 Feb 2021
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse
  in Imbalanced Training
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training
Cong Fang
Hangfeng He
Qi Long
Weijie J. Su
FAtt
122
165
0
29 Jan 2021
Advances in Electron Microscopy with Deep Learning
Advances in Electron Microscopy with Deep Learning
Jeffrey M. Ede
29
2
0
04 Jan 2021
Towards Understanding Ensemble, Knowledge Distillation and
  Self-Distillation in Deep Learning
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
FedML
31
354
0
17 Dec 2020
On Function Approximation in Reinforcement Learning: Optimism in the
  Face of Large State Spaces
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Zhuoran Yang
Chi Jin
Zhaoran Wang
Mengdi Wang
Michael I. Jordan
18
18
0
09 Nov 2020
Global optimality of softmax policy gradient with single hidden layer
  neural networks in the mean-field regime
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Andrea Agazzi
Jianfeng Lu
13
15
0
22 Oct 2020
Prediction intervals for Deep Neural Networks
Prediction intervals for Deep Neural Networks
Tullio Mancini
Hector F. Calvo-Pardo
Jose Olmo
UQCV
OOD
18
4
0
08 Oct 2020
On the linearity of large non-linear models: when and why the tangent
  kernel is constant
On the linearity of large non-linear models: when and why the tangent kernel is constant
Chaoyue Liu
Libin Zhu
M. Belkin
18
138
0
02 Oct 2020
Deep Equals Shallow for ReLU Networks in Kernel Regimes
Deep Equals Shallow for ReLU Networks in Kernel Regimes
A. Bietti
Francis R. Bach
17
86
0
30 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
26
79
0
17 Sep 2020
GraphNorm: A Principled Approach to Accelerating Graph Neural Network
  Training
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training
Tianle Cai
Shengjie Luo
Keyulu Xu
Di He
Tie-Yan Liu
Liwei Wang
GNN
16
158
0
07 Sep 2020
12
Next