ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.07867
  4. Cited By
Global Convergence of Deep Networks with One Wide Layer Followed by
  Pyramidal Topology
v1v2v3 (latest)

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

18 February 2020
Quynh N. Nguyen
Marco Mondelli
    ODLAI4CE
ArXiv (abs)PDFHTML

Papers citing "Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology"

50 / 54 papers shown
Title
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
153
0
0
16 May 2025
Unraveling the Gradient Descent Dynamics of Transformers
Unraveling the Gradient Descent Dynamics of Transformers
Bingqing Song
Boran Han
Shuai Zhang
Jie Ding
Mingyi Hong
AI4CE
146
5
0
12 Nov 2024
ActNAS : Generating Efficient YOLO Models using Activation NAS
ActNAS : Generating Efficient YOLO Models using Activation NAS
Sudhakar Sah
Ravish Kumar
Darshan C. Ganji
Ehsan Saboori
85
1
0
11 Oct 2024
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural
  Collapse
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Arthur Jacot
Peter Súkeník
Zihan Wang
Marco Mondelli
140
3
0
07 Oct 2024
In-Context Learning with Representations: Contextual Generalization of
  Trained Transformers
In-Context Learning with Representations: Contextual Generalization of Trained Transformers
Tong Yang
Yu Huang
Yingbin Liang
Yuejie Chi
MLT
139
16
0
19 Aug 2024
Invertible Neural Warp for NeRF
Invertible Neural Warp for NeRF
Shin-Fang Chng
Ravi Garg
Hemanth Saratchandran
Simon Lucey
110
4
0
17 Jul 2024
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical
  data of arbitrary dimension
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension
Kedar Karhadkar
Michael Murray
Guido Montúfar
126
6
0
23 May 2024
Approximation and Gradient Descent Training with Neural Networks
Approximation and Gradient Descent Training with Neural Networks
G. Welper
85
2
0
19 May 2024
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep
  Ritz Method
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method
Yuling Jiao
Yanming Lai
Yang Wang
AI4CE
57
1
0
19 May 2024
Physics-Informed Neural Networks: Minimizing Residual Loss with Wide
  Networks and Effective Activations
Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations
Nima Hosseini Dashtbayaz
G. Farhani
Boyu Wang
Charles Ling
157
3
0
02 May 2024
Robust NAS under adversarial training: benchmark, theory, and beyond
Robust NAS under adversarial training: benchmark, theory, and beyond
Yongtao Wu
Fanghui Liu
Carl-Johann Simon-Gabriel
Grigorios G. Chrysos
Volkan Cevher
AAMLOOD
120
7
0
19 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime
Generalization of Scaled Deep ResNets in the Mean-Field Regime
Yihang Chen
Fanghui Liu
Yiping Lu
Grigorios G. Chrysos
Volkan Cevher
85
2
0
14 Mar 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
157
22
0
08 Feb 2024
Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate
  Networks
Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks
Hemanth Saratchandran
Shin-Fang Chng
Simon Lucey
123
2
0
07 Feb 2024
Architectural Strategies for the optimization of Physics-Informed Neural
  Networks
Architectural Strategies for the optimization of Physics-Informed Neural Networks
Hemanth Saratchandran
Shin-Fang Chng
Simon Lucey
AI4CE
92
0
0
05 Feb 2024
On the Convergence of Encoder-only Shallow Transformers
On the Convergence of Encoder-only Shallow Transformers
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
Volkan Cevher
124
9
0
02 Nov 2023
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
120
36
0
19 Oct 2023
Wide Neural Networks as Gaussian Processes: Lessons from Deep
  Equilibrium Models
Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models
Tianxiang Gao
Xiaokai Huo
Hailiang Liu
Hongyang Gao
BDL
109
9
0
16 Oct 2023
Approximation Results for Gradient Descent trained Neural Networks
Approximation Results for Gradient Descent trained Neural Networks
G. Welper
86
1
0
09 Sep 2023
Implicit regularization of deep residual networks towards neural ODEs
Implicit regularization of deep residual networks towards neural ODEs
Pierre Marion
Yu-Han Wu
Michael E. Sander
Gérard Biau
174
19
0
03 Sep 2023
Deterministic equivalent of the Conjugate Kernel matrix associated to
  Artificial Neural Networks
Deterministic equivalent of the Conjugate Kernel matrix associated to Artificial Neural Networks
Clément Chouard
85
3
0
09 Jun 2023
How Spurious Features Are Memorized: Precise Analysis for Random and NTK
  Features
How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features
Simone Bombari
Marco Mondelli
AAML
135
6
0
20 May 2023
On the effectiveness of neural priors in modeling dynamical systems
On the effectiveness of neural priors in modeling dynamical systems
Sameera Ramasinghe
Hemanth Saratchandran
Violetta Shevchenko
Simon Lucey
129
3
0
10 Mar 2023
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
Lu Xia
M. Hochstenbach
Stefano Massei
145
2
0
23 Jan 2023
Mechanistic Mode Connectivity
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
138
50
0
15 Nov 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion
Characterizing the Spectrum of the NTK via a Power Series Expansion
Michael Murray
Hui Jin
Benjamin Bowman
Guido Montúfar
150
13
0
15 Nov 2022
Overparameterized random feature regression with nearly orthogonal data
Overparameterized random feature regression with nearly orthogonal data
Zhichao Wang
Yizhe Zhu
131
5
0
11 Nov 2022
Finite Sample Identification of Wide Shallow Neural Networks with Biases
Finite Sample Identification of Wide Shallow Neural Networks with Biases
M. Fornasier
T. Klock
Marco Mondelli
Michael Rauchensteiner
93
6
0
08 Nov 2022
On skip connections and normalisation layers in deep optimisation
On skip connections and normalisation layers in deep optimisation
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
121
2
0
10 Oct 2022
Restricted Strong Convexity of Deep Learning Models with Smooth
  Activations
Restricted Strong Convexity of Deep Learning Models with Smooth Activations
A. Banerjee
Pedro Cisneros-Velarde
Libin Zhu
M. Belkin
103
9
0
29 Sep 2022
Magnitude and Angle Dynamics in Training Single ReLU Neurons
Magnitude and Angle Dynamics in Training Single ReLU Neurons
Sangmin Lee
Byeongsu Sim
Jong Chul Ye
MLT
187
6
0
27 Sep 2022
Approximation results for Gradient Descent trained Shallow Neural
  Networks in $1d$
Approximation results for Gradient Descent trained Shallow Neural Networks in 1d1d1d
R. Gentile
G. Welper
ODL
138
8
0
17 Sep 2022
Generalization Properties of NAS under Activation and Skip Connection
  Search
Generalization Properties of NAS under Activation and Skip Connection Search
Zhenyu Zhu
Fanghui Liu
Grigorios G. Chrysos
Volkan Cevher
AI4CE
154
19
0
15 Sep 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep
  Models
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
196
193
0
13 Aug 2022
Spectral Bias Outside the Training Set for Deep Networks in the Kernel
  Regime
Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime
Benjamin Bowman
Guido Montúfar
128
16
0
06 Jun 2022
Global Convergence of Over-parameterized Deep Equilibrium Models
Global Convergence of Over-parameterized Deep Equilibrium Models
Zenan Ling
Xingyu Xie
Qiuhao Wang
Zongpeng Zhang
Zhouchen Lin
145
12
0
27 May 2022
A Framework for Overparameterized Learning
A Framework for Overparameterized Learning
Dávid Terjék
Diego González-Sánchez
MLT
78
1
0
26 May 2022
Memorization and Optimization in Deep Neural Networks with Minimum
  Over-parameterization
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization
Simone Bombari
Mohammad Hossein Amani
Marco Mondelli
128
30
0
20 May 2022
Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with
  Linear Widths
Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths
Tianxiang Gao
Hongyang Gao
MLT
98
5
0
16 May 2022
Finite-Sum Optimization: A New Perspective for Convergence to a Global
  Solution
Finite-Sum Optimization: A New Perspective for Convergence to a Global Solution
Lam M. Nguyen
Trang H. Tran
Marten van Dijk
138
3
0
07 Feb 2022
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural
  Networks
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks
Benjamin Bowman
Guido Montúfar
130
12
0
12 Jan 2022
To Supervise or Not: How to Effectively Learn Wireless Interference
  Management Models?
To Supervise or Not: How to Effectively Learn Wireless Interference Management Models?
Bingqing Song
Haoran Sun
Wenqiang Pu
Sijia Liu
Min-Fong Hong
54
0
0
28 Dec 2021
Rethinking Influence Functions of Neural Networks in the
  Over-parameterized Regime
Rethinking Influence Functions of Neural Networks in the Over-parameterized Regime
Rui Zhang
Shihua Zhang
TDI
122
23
0
15 Dec 2021
SGD Through the Lens of Kolmogorov Complexity
SGD Through the Lens of Kolmogorov Complexity
Gregory Schwartzman
113
1
0
10 Nov 2021
Subquadratic Overparameterization for Shallow Neural Networks
Subquadratic Overparameterization for Shallow Neural Networks
Chaehwan Song
Ali Ramezani-Kebrya
Thomas Pethick
Armin Eftekhari
Volkan Cevher
123
32
0
02 Nov 2021
A global convergence theory for deep ReLU implicit networks via
  over-parameterization
A global convergence theory for deep ReLU implicit networks via over-parameterization
Tianxiang Gao
Hailiang Liu
Jia Liu
Hridesh Rajan
Hongyang Gao
MLT
127
16
0
11 Oct 2021
Deformed semicircle law and concentration of nonlinear random matrices
  for ultra-wide neural networks
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks
Zhichao Wang
Yizhe Zhu
141
22
0
20 Sep 2021
Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time
Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time
Yuyang Deng
Mohammad Mahdi Kamani
M. Mahdavi
FedML
94
15
0
22 Jul 2021
Generalization of GANs and overparameterized models under Lipschitz
  continuity
Generalization of GANs and overparameterized models under Lipschitz continuity
Khoat Than
Nghia D. Vu
AI4CE
98
2
0
06 Apr 2021
When Are Solutions Connected in Deep Networks?
When Are Solutions Connected in Deep Networks?
Quynh N. Nguyen
Pierre Bréchet
Marco Mondelli
127
10
0
18 Feb 2021
12
Next