ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.08888
  4. Cited By
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

21 November 2018
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
    ODL
ArXivPDFHTML

Papers citing "Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks"

41 / 91 papers shown
Title
Predicting Training Time Without Training
Predicting Training Time Without Training
L. Zancato
Alessandro Achille
Avinash Ravichandran
Rahul Bhotika
Stefano Soatto
18
24
0
28 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Zuyue Fu
Zhuoran Yang
Zhaoran Wang
15
42
0
02 Aug 2020
Implicit Bias in Deep Linear Classification: Initialization Scale vs
  Training Accuracy
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
E. Moroshko
Suriya Gunasekar
Blake E. Woodworth
J. Lee
Nathan Srebro
Daniel Soudry
27
85
0
13 Jul 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Tensor Programs II: Neural Tangent Kernel for Any Architecture
Greg Yang
40
134
0
25 Jun 2020
Can Temporal-Difference and Q-Learning Learn Representation? A
  Mean-Field Theory
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
Yufeng Zhang
Qi Cai
Zhuoran Yang
Yongxin Chen
Zhaoran Wang
OOD
MLT
72
11
0
08 Jun 2020
Feature Purification: How Adversarial Training Performs Robust Deep
  Learning
Feature Purification: How Adversarial Training Performs Robust Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
MLT
AAML
27
146
0
20 May 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
  Optimization Via Overparameterization From Depth
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth
Yiping Lu
Chao Ma
Yulong Lu
Jianfeng Lu
Lexing Ying
MLT
33
78
0
11 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
Convergence of End-to-End Training in Deep Unsupervised Contrastive
  Learning
Convergence of End-to-End Training in Deep Unsupervised Contrastive Learning
Zixin Wen
SSL
21
2
0
17 Feb 2020
Over-parameterized Adversarial Training: An Analysis Overcoming the
  Curse of Dimensionality
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
Yi Zhang
Orestis Plevrakis
S. Du
Xingguo Li
Zhao-quan Song
Sanjeev Arora
21
51
0
16 Feb 2020
Memory capacity of neural networks with threshold and ReLU activations
Memory capacity of neural networks with threshold and ReLU activations
Roman Vershynin
26
21
0
20 Jan 2020
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating
  Decreasing Paths to Infinity
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity
Shiyu Liang
Ruoyu Sun
R. Srikant
25
19
0
31 Dec 2019
Towards Understanding the Spectral Bias of Deep Learning
Towards Understanding the Spectral Bias of Deep Learning
Yuan Cao
Zhiying Fang
Yue Wu
Ding-Xuan Zhou
Quanquan Gu
29
214
0
03 Dec 2019
Neural Contextual Bandits with UCB-based Exploration
Neural Contextual Bandits with UCB-based Exploration
Dongruo Zhou
Lihong Li
Quanquan Gu
22
15
0
11 Nov 2019
Enhanced Convolutional Neural Tangent Kernels
Enhanced Convolutional Neural Tangent Kernels
Zhiyuan Li
Ruosong Wang
Dingli Yu
S. Du
Wei Hu
Ruslan Salakhutdinov
Sanjeev Arora
16
131
0
03 Nov 2019
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Lei Wu
Qingcan Wang
Chao Ma
ODL
AI4CE
20
22
0
02 Nov 2019
Growing axons: greedy learning of neural networks with application to
  function approximation
Growing axons: greedy learning of neural networks with application to function approximation
Daria Fokina
Ivan V. Oseledets
11
18
0
28 Oct 2019
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Sanjeev Arora
S. Du
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
Dingli Yu
AAML
9
161
0
03 Oct 2019
Beyond Linearization: On Quadratic and Higher-Order Approximation of
  Wide Neural Networks
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks
Yu Bai
J. Lee
11
116
0
03 Oct 2019
Sample Efficient Policy Gradient Methods with Recursive Variance
  Reduction
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Pan Xu
F. Gao
Quanquan Gu
23
83
0
18 Sep 2019
The generalization error of random features regression: Precise
  asymptotics and double descent curve
The generalization error of random features regression: Precise asymptotics and double descent curve
Song Mei
Andrea Montanari
39
626
0
14 Aug 2019
Kernel and Rich Regimes in Overparametrized Models
Blake E. Woodworth
Suriya Gunasekar
Pedro H. P. Savarese
E. Moroshko
Itay Golan
J. Lee
Daniel Soudry
Nathan Srebro
19
353
0
13 Jun 2019
Generalization bounds for deep convolutional neural networks
Generalization bounds for deep convolutional neural networks
Philip M. Long
Hanie Sedghi
MLT
37
89
0
29 May 2019
Norm-based generalisation bounds for multi-class convolutional neural
  networks
Norm-based generalisation bounds for multi-class convolutional neural networks
Antoine Ledent
Waleed Mustafa
Yunwen Lei
Marius Kloft
12
5
0
29 May 2019
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for
  Regression Problems
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Tianle Cai
Ruiqi Gao
Jikai Hou
Siyu Chen
Dong Wang
Di He
Zhihua Zhang
Liwei Wang
ODL
16
57
0
28 May 2019
What Can ResNet Learn Efficiently, Going Beyond Kernels?
What Can ResNet Learn Efficiently, Going Beyond Kernels?
Zeyuan Allen-Zhu
Yuanzhi Li
24
183
0
24 May 2019
Gradient Descent can Learn Less Over-parameterized Two-layer Neural
  Networks on Classification Problems
Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems
Atsushi Nitanda
Geoffrey Chinot
Taiji Suzuki
MLT
13
33
0
23 May 2019
A type of generalization error induced by initialization in deep neural
  networks
A type of generalization error induced by initialization in deep neural networks
Yaoyu Zhang
Zhi-Qin John Xu
Tao Luo
Zheng Ma
9
49
0
19 May 2019
Linearized two-layers neural networks in high dimension
Linearized two-layers neural networks in high dimension
Behrooz Ghorbani
Song Mei
Theodor Misiakiewicz
Andrea Montanari
MLT
13
241
0
27 Apr 2019
On Exact Computation with an Infinitely Wide Neural Net
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
24
899
0
26 Apr 2019
Analysis of the Gradient Descent Algorithm for a Deep Neural Network
  Model with Skip-connections
Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections
E. Weinan
Chao Ma
Qingcan Wang
Lei Wu
MLT
27
22
0
10 Apr 2019
Gradient Descent with Early Stopping is Provably Robust to Label Noise
  for Overparameterized Neural Networks
Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
Mingchen Li
Mahdi Soltanolkotabi
Samet Oymak
NoLa
26
350
0
27 Mar 2019
Fine-Grained Analysis of Optimization and Generalization for
  Overparameterized Two-Layer Neural Networks
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
MLT
35
961
0
24 Jan 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks
Width Provably Matters in Optimization for Deep Linear Neural Networks
S. Du
Wei Hu
16
93
0
24 Jan 2019
Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient Descent Finds Global Minima of Deep Neural Networks
S. Du
J. Lee
Haochuan Li
Liwei Wang
M. Tomizuka
ODL
18
1,120
0
09 Nov 2018
Small ReLU networks are powerful memorizers: a tight analysis of
  memorization capacity
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
Chulhee Yun
S. Sra
Ali Jadbabaie
13
117
0
17 Oct 2018
Regularization Matters: Generalization and Optimization of Neural Nets
  v.s. their Induced Kernel
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Colin Wei
J. Lee
Qiang Liu
Tengyu Ma
18
243
0
12 Oct 2018
Learning One-hidden-layer ReLU Networks via Gradient Descent
Learning One-hidden-layer ReLU Networks via Gradient Descent
Xiao Zhang
Yaodong Yu
Lingxiao Wang
Quanquan Gu
MLT
26
134
0
20 Jun 2018
Benefits of depth in neural networks
Benefits of depth in neural networks
Matus Telgarsky
136
602
0
14 Feb 2016
Norm-Based Capacity Control in Neural Networks
Norm-Based Capacity Control in Neural Networks
Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
119
577
0
27 Feb 2015
The Loss Surfaces of Multilayer Networks
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
179
1,185
0
30 Nov 2014
Previous
12