A Convergence Theory for Deep Learning via Over-Parameterization

9 November 2018

Papers citing "A Convergence Theory for Deep Learning via Over-Parameterization"

50 / 370 papers shown

Title
On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths Quynh N. Nguyen 47 48 0 24 Jan 2021
Reproducing Activation Function for Deep Learning Senwei Liang Liyao Lyu Chunmei Wang Haizhao Yang 36 21 0 13 Jan 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks Asaf Noy Yi Tian Xu Y. Aflalo Lihi Zelnik-Manor Rong Jin 41 3 0 12 Jan 2021
Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise Spencer Frei Yuan Cao Quanquan Gu FedML MLT 70 19 0 04 Jan 2021
Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training Theodoros Tsiligkaridis Jay Roberts AAML 22 11 0 22 Dec 2020
Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks Quynh N. Nguyen Marco Mondelli Guido Montúfar 25 81 0 21 Dec 2020
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning Zeyuan Allen-Zhu Yuanzhi Li FedML 60 356 0 17 Dec 2020
Estimation of the Mean Function of Functional Data via Deep Neural Networks Shuoyang Wang Guanqun Cao Zuofeng Shang 35 20 0 08 Dec 2020
Gradient Starvation: A Learning Proclivity in Neural Networks Mohammad Pezeshki Sekouba Kaba Yoshua Bengio Aaron Courville Doina Precup Guillaume Lajoie MLT 50 258 0 18 Nov 2020
Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting Zeke Xie Fengxiang He Shaopeng Fu Issei Sato Dacheng Tao Masashi Sugiyama 21 60 0 12 Nov 2020
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces Zhuoran Yang Chi Jin Zhaoran Wang Mengdi Wang Michael I. Jordan 39 18 0 09 Nov 2020
Are wider nets better given the same number of parameters? A. Golubeva Behnam Neyshabur Guy Gur-Ari 27 44 0 27 Oct 2020
A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks Zhiqi Bu Shiyun Xu Kan Chen 33 17 0 25 Oct 2020
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime Andrea Agazzi Jianfeng Lu 13 15 0 22 Oct 2020
Deep Learning is Singular, and That's Good Daniel Murfet Susan Wei Biwei Huang Hui Li Jesse Gell-Redman T. Quella UQCV 24 26 0 22 Oct 2020
A Unifying View on Implicit Bias in Training Linear Neural Networks Chulhee Yun Shankar Krishnan H. Mobahi MLT 18 80 0 06 Oct 2020
On the linearity of large non-linear models: when and why the tangent kernel is constant Chaoyue Liu Libin Zhu M. Belkin 21 140 0 02 Oct 2020
Neural Thompson Sampling Weitong Zhang Dongruo Zhou Lihong Li Quanquan Gu 34 115 0 02 Oct 2020
Deep Equals Shallow for ReLU Networks in Kernel Regimes A. Bietti Francis R. Bach 30 86 0 30 Sep 2020
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks Keyulu Xu Mozhi Zhang Jingling Li S. Du Ken-ichi Kawarabayashi Stefanie Jegelka MLT 25 306 0 24 Sep 2020
Tensor Programs III: Neural Matrix Laws Greg Yang 14 44 0 22 Sep 2020
Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS Lin Chen Sheng Xu 32 93 0 22 Sep 2020
Generalized Leverage Score Sampling for Neural Networks J. Lee Ruoqi Shen Zhao Song Mengdi Wang Zheng Yu 21 42 0 21 Sep 2020
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training Tianle Cai Shengjie Luo Keyulu Xu Di He Tie-Yan Liu Liwei Wang GNN 32 159 0 07 Sep 2020
Predicting Training Time Without Training L. Zancato Alessandro Achille Avinash Ravichandran Rahul Bhotika Stefano Soatto 26 24 0 28 Aug 2020
Deep Networks and the Multiple Manifold Problem Sam Buchanan D. Gilboa John N. Wright 166 39 0 25 Aug 2020
Multiple Descent: Design Your Own Generalization Curve Lin Chen Yifei Min M. Belkin Amin Karbasi DRL 33 61 0 03 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy Zuyue Fu Zhuoran Yang Zhaoran Wang 21 42 0 02 Aug 2020
The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training Andrea Montanari Yiqiao Zhong 49 95 0 25 Jul 2020
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy E. Moroshko Suriya Gunasekar Blake E. Woodworth J. Lee Nathan Srebro Daniel Soudry 35 85 0 13 Jul 2020
Weak error analysis for stochastic gradient descent optimization algorithms A. Bercher Lukas Gonon Arnulf Jentzen Diyora Salimova 28 4 0 03 Jul 2020
On the Similarity between the Laplace and Neural Tangent Kernels Amnon Geifman A. Yadav Yoni Kasten Meirav Galun David Jacobs Ronen Basri 23 89 0 03 Jul 2020
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach Luofeng Liao You-Lin Chen Zhuoran Yang Bo Dai Zhaoran Wang Mladen Kolar 30 33 0 02 Jul 2020
Associative Memory in Iterated Overparameterized Sigmoid Autoencoders Yibo Jiang Cengiz Pehlevan 19 13 0 30 Jun 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture Greg Yang 58 135 0 25 Jun 2020
Logarithmic Pruning is All You Need Laurent Orseau Marcus Hutter Omar Rivasplata 28 88 0 22 Jun 2020
DO-Conv: Depthwise Over-parameterized Convolutional Layer Jinming Cao Yangyan Li Mingchao Sun Ying-Cong Chen Dani Lischinski Daniel Cohen-Or Baoquan Chen Changhe Tu OOD 33 166 0 22 Jun 2020
Training (Overparametrized) Neural Networks in Near-Linear Time Jan van den Brand Binghui Peng Zhao Song Omri Weinstein ODL 29 82 0 20 Jun 2020
Exploring Weight Importance and Hessian Bias in Model Pruning Mingchen Li Yahya Sattar Christos Thrampoulidis Samet Oymak 28 3 0 19 Jun 2020
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives Qi Qi Zhishuai Guo Yi Tian Xu Rong Jin Tianbao Yang 33 44 0 17 Jun 2020
Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting Giorgos Bouritsas Fabrizio Frasca S. Zafeiriou M. Bronstein 58 424 0 16 Jun 2020
Non-convergence of stochastic gradient descent in the training of deep neural networks Patrick Cheridito Arnulf Jentzen Florian Rossmannek 14 37 0 12 Jun 2020
Directional convergence and alignment in deep learning Ziwei Ji Matus Telgarsky 20 164 0 11 Jun 2020
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory Yufeng Zhang Qi Cai Zhuoran Yang Yongxin Chen Zhaoran Wang OOD MLT 141 11 0 08 Jun 2020
Is deeper better? It depends on locality of relevant features Takashi Mori Masahito Ueda OOD 25 4 0 26 May 2020
Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks Z. Fan Zhichao Wang 44 71 0 25 May 2020
Feature Purification: How Adversarial Training Performs Robust Deep Learning Zeyuan Allen-Zhu Yuanzhi Li MLT AAML 39 147 0 20 May 2020
Orthogonal Over-Parameterized Training Weiyang Liu Rongmei Lin Zhen Liu James M. Rehg Liam Paull Li Xiong Le Song Adrian Weller 32 41 0 09 Apr 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth Yiping Lu Chao Ma Yulong Lu Jianfeng Lu Lexing Ying MLT 39 78 0 11 Mar 2020
Frequency Bias in Neural Networks for Input of Non-Uniform Density Ronen Basri Meirav Galun Amnon Geifman David Jacobs Yoni Kasten S. Kritchman 42 183 0 10 Mar 2020