v1v2v3 (latest)

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

18 February 2020

Papers citing "Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology"

50 / 54 papers shown

Title
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models Ziqing Xu Hancheng Min Salma Tarmoun Enrique Mallada Rene Vidal 153 0 0 16 May 2025
Unraveling the Gradient Descent Dynamics of Transformers Bingqing Song Boran Han Shuai Zhang Jie Ding Mingyi Hong AI4CE 146 5 0 12 Nov 2024
ActNAS : Generating Efficient YOLO Models using Activation NAS Sudhakar Sah Ravish Kumar Darshan C. Ganji Ehsan Saboori 85 1 0 11 Oct 2024
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse Arthur Jacot Peter Súkeník Zihan Wang Marco Mondelli 140 3 0 07 Oct 2024
In-Context Learning with Representations: Contextual Generalization of Trained Transformers Tong Yang Yu Huang Yingbin Liang Yuejie Chi MLT 139 16 0 19 Aug 2024
Invertible Neural Warp for NeRF Shin-Fang Chng Ravi Garg Hemanth Saratchandran Simon Lucey 110 4 0 17 Jul 2024
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension Kedar Karhadkar Michael Murray Guido Montúfar 126 6 0 23 May 2024
Approximation and Gradient Descent Training with Neural Networks G. Welper 85 2 0 19 May 2024
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method Yuling Jiao Yanming Lai Yang Wang AI4CE 57 1 0 19 May 2024
Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations Nima Hosseini Dashtbayaz G. Farhani Boyu Wang Charles Ling 157 3 0 02 May 2024
Robust NAS under adversarial training: benchmark, theory, and beyond Yongtao Wu Fanghui Liu Carl-Johann Simon-Gabriel Grigorios G. Chrysos Volkan Cevher AAML OOD 120 7 0 19 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime Yihang Chen Fanghui Liu Yiping Lu Grigorios G. Chrysos Volkan Cevher 85 2 0 14 Mar 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 157 22 0 08 Feb 2024
Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks Hemanth Saratchandran Shin-Fang Chng Simon Lucey 123 2 0 07 Feb 2024
Architectural Strategies for the optimization of Physics-Informed Neural Networks Hemanth Saratchandran Shin-Fang Chng Simon Lucey AI4CE 92 0 0 05 Feb 2024
On the Convergence of Encoder-only Shallow Transformers Yongtao Wu Fanghui Liu Grigorios G. Chrysos Volkan Cevher 124 9 0 02 Nov 2023
On the Optimization and Generalization of Multi-head Attention Puneesh Deora Rouzbeh Ghaderi Hossein Taheri Christos Thrampoulidis MLT 120 36 0 19 Oct 2023
Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models Tianxiang Gao Xiaokai Huo Hailiang Liu Hongyang Gao BDL 109 9 0 16 Oct 2023
Approximation Results for Gradient Descent trained Neural Networks G. Welper 86 1 0 09 Sep 2023
Implicit regularization of deep residual networks towards neural ODEs Pierre Marion Yu-Han Wu Michael E. Sander Gérard Biau 174 19 0 03 Sep 2023
Deterministic equivalent of the Conjugate Kernel matrix associated to Artificial Neural Networks Clément Chouard 85 3 0 09 Jun 2023
How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features Simone Bombari Marco Mondelli AAML 135 6 0 20 May 2023
On the effectiveness of neural priors in modeling dynamical systems Sameera Ramasinghe Hemanth Saratchandran Violetta Shevchenko Simon Lucey 129 3 0 10 Mar 2023
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality Lu Xia M. Hochstenbach Stefano Massei 145 2 0 23 Jan 2023
Mechanistic Mode Connectivity Ekdeep Singh Lubana Eric J. Bigelow Robert P. Dick David M. Krueger Hidenori Tanaka 138 50 0 15 Nov 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion Michael Murray Hui Jin Benjamin Bowman Guido Montúfar 150 13 0 15 Nov 2022
Overparameterized random feature regression with nearly orthogonal data Zhichao Wang Yizhe Zhu 131 5 0 11 Nov 2022
Finite Sample Identification of Wide Shallow Neural Networks with Biases M. Fornasier T. Klock Marco Mondelli Michael Rauchensteiner 93 6 0 08 Nov 2022
On skip connections and normalisation layers in deep optimisation L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 121 2 0 10 Oct 2022
Restricted Strong Convexity of Deep Learning Models with Smooth Activations A. Banerjee Pedro Cisneros-Velarde Libin Zhu M. Belkin 103 9 0 29 Sep 2022
Magnitude and Angle Dynamics in Training Single ReLU Neurons Sangmin Lee Byeongsu Sim Jong Chul Ye MLT 187 6 0 27 Sep 2022
Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$ R. Gentile G. Welper ODL 138 8 0 17 Sep 2022
Generalization Properties of NAS under Activation and Skip Connection Search Zhenyu Zhu Fanghui Liu Grigorios G. Chrysos Volkan Cevher AI4CE 154 19 0 15 Sep 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models Xingyu Xie Pan Zhou Huan Li Zhouchen Lin Shuicheng Yan ODL 196 193 0 13 Aug 2022
Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime Benjamin Bowman Guido Montúfar 128 16 0 06 Jun 2022
Global Convergence of Over-parameterized Deep Equilibrium Models Zenan Ling Xingyu Xie Qiuhao Wang Zongpeng Zhang Zhouchen Lin 145 12 0 27 May 2022
A Framework for Overparameterized Learning Dávid Terjék Diego González-Sánchez MLT 78 1 0 26 May 2022
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization Simone Bombari Mohammad Hossein Amani Marco Mondelli 128 30 0 20 May 2022
Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths Tianxiang Gao Hongyang Gao MLT 98 5 0 16 May 2022
Finite-Sum Optimization: A New Perspective for Convergence to a Global Solution Lam M. Nguyen Trang H. Tran Marten van Dijk 138 3 0 07 Feb 2022
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks Benjamin Bowman Guido Montúfar 130 12 0 12 Jan 2022
To Supervise or Not: How to Effectively Learn Wireless Interference Management Models? Bingqing Song Haoran Sun Wenqiang Pu Sijia Liu Min-Fong Hong 54 0 0 28 Dec 2021
Rethinking Influence Functions of Neural Networks in the Over-parameterized Regime Rui Zhang Shihua Zhang TDI 122 23 0 15 Dec 2021
SGD Through the Lens of Kolmogorov Complexity Gregory Schwartzman 113 1 0 10 Nov 2021
Subquadratic Overparameterization for Shallow Neural Networks Chaehwan Song Ali Ramezani-Kebrya Thomas Pethick Armin Eftekhari Volkan Cevher 123 32 0 02 Nov 2021
A global convergence theory for deep ReLU implicit networks via over-parameterization Tianxiang Gao Hailiang Liu Jia Liu Hridesh Rajan Hongyang Gao MLT 127 16 0 11 Oct 2021
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks Zhichao Wang Yizhe Zhu 141 22 0 20 Sep 2021
Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time Yuyang Deng Mohammad Mahdi Kamani M. Mahdavi FedML 94 15 0 22 Jul 2021
Generalization of GANs and overparameterized models under Lipschitz continuity Khoat Than Nghia D. Vu AI4CE 98 2 0 06 Apr 2021
When Are Solutions Connected in Deep Networks? Quynh N. Nguyen Pierre Bréchet Marco Mondelli 127 10 0 18 Feb 2021