ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.07867
  4. Cited By
Global Convergence of Deep Networks with One Wide Layer Followed by
  Pyramidal Topology
v1v2v3 (latest)

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Neural Information Processing Systems (NeurIPS), 2020
18 February 2020
Quynh N. Nguyen
Marco Mondelli
    ODLAI4CE
ArXiv (abs)PDFHTML

Papers citing "Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology"

50 / 60 papers shown
A general technique for approximating high-dimensional empirical kernel matrices
A general technique for approximating high-dimensional empirical kernel matrices
Chiraag Kaushik
Justin Romberg
Vidya Muthukumar
165
1
0
05 Nov 2025
Convergence Rates for Gradient Descent on the Edge of Stability in Overparametrised Least Squares
Convergence Rates for Gradient Descent on the Edge of Stability in Overparametrised Least Squares
Lachlan Ewen MacDonald
Hancheng Min
Leandro Palma
Salma Tarmoun
Ziqing Xu
Rene Vidal
MLT
166
0
0
20 Oct 2025
Smooth Quasar-Convex Optimization with Constraints
Smooth Quasar-Convex Optimization with Constraints
David Martínez-Rubio
177
3
0
02 Oct 2025
Gradient Flow Convergence Guarantee for General Neural Network Architectures
Gradient Flow Convergence Guarantee for General Neural Network Architectures
Yash Jakhmola
MLT
180
0
0
28 Sep 2025
A Law of Data Reconstruction for Random Features (and Beyond)
A Law of Data Reconstruction for Random Features (and Beyond)
Leonardo Iurada
Simone Bombari
Tatiana Tommasi
Marco Mondelli
197
0
0
26 Sep 2025
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
Keitaro Sakamoto
Issei Sato
280
0
0
25 Sep 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
383
4
0
16 May 2025
Unraveling the Gradient Descent Dynamics of Transformers
Unraveling the Gradient Descent Dynamics of TransformersNeural Information Processing Systems (NeurIPS), 2024
Bingqing Song
Boran Han
Shuai Zhang
Jie Ding
Mingyi Hong
AI4CE
382
11
0
12 Nov 2024
ActNAS : Generating Efficient YOLO Models using Activation NAS
ActNAS : Generating Efficient YOLO Models using Activation NAS
Sudhakar Sah
Ravish Kumar
Darshan C. Ganji
Ehsan Saboori
265
2
0
11 Oct 2024
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural
  Collapse
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural CollapseInternational Conference on Learning Representations (ICLR), 2024
Arthur Jacot
Peter Súkeník
Zihan Wang
Marco Mondelli
354
14
0
07 Oct 2024
In-Context Learning with Representations: Contextual Generalization of
  Trained Transformers
In-Context Learning with Representations: Contextual Generalization of Trained TransformersNeural Information Processing Systems (NeurIPS), 2024
Tong Yang
Yu Huang
Yingbin Liang
Yuejie Chi
MLT
374
35
0
19 Aug 2024
Invertible Neural Warp for NeRF
Invertible Neural Warp for NeRF
Shin-Fang Chng
Ravi Garg
Hemanth Saratchandran
Simon Lucey
313
8
0
17 Jul 2024
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical
  data of arbitrary dimension
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimensionNeural Information Processing Systems (NeurIPS), 2024
Kedar Karhadkar
Michael Murray
Guido Montúfar
412
8
0
23 May 2024
Approximation and Gradient Descent Training with Neural Networks
Approximation and Gradient Descent Training with Neural Networks
G. Welper
297
2
0
19 May 2024
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep
  Ritz Method
Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz MethodIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2024
Yuling Jiao
Yanming Lai
Yang Wang
AI4CE
214
1
0
19 May 2024
Physics-Informed Neural Networks: Minimizing Residual Loss with Wide
  Networks and Effective Activations
Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective ActivationsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Nima Hosseini Dashtbayaz
G. Farhani
Boyu Wang
Charles Ling
385
3
0
02 May 2024
Robust NAS under adversarial training: benchmark, theory, and beyond
Robust NAS under adversarial training: benchmark, theory, and beyond
Yongtao Wu
Fanghui Liu
Carl-Johann Simon-Gabriel
Grigorios G. Chrysos
Volkan Cevher
AAMLOOD
416
10
0
19 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime
Generalization of Scaled Deep ResNets in the Mean-Field RegimeInternational Conference on Learning Representations (ICLR), 2024
Yihang Chen
Fanghui Liu
Yiping Lu
Grigorios G. Chrysos
Volkan Cevher
300
2
0
14 Mar 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
534
31
0
08 Feb 2024
Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate
  Networks
Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks
Hemanth Saratchandran
Shin-Fang Chng
Simon Lucey
303
2
0
07 Feb 2024
Architectural Strategies for the optimization of Physics-Informed Neural
  Networks
Architectural Strategies for the optimization of Physics-Informed Neural Networks
Hemanth Saratchandran
Shin-Fang Chng
Simon Lucey
AI4CE
234
1
0
05 Feb 2024
On the Convergence of Encoder-only Shallow Transformers
On the Convergence of Encoder-only Shallow TransformersNeural Information Processing Systems (NeurIPS), 2023
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
Volkan Cevher
332
14
0
02 Nov 2023
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
379
47
0
19 Oct 2023
Wide Neural Networks as Gaussian Processes: Lessons from Deep
  Equilibrium Models
Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2023
Tianxiang Gao
Xiaokai Huo
Hailiang Liu
Hongyang Gao
BDL
267
19
0
16 Oct 2023
Approximation Results for Gradient Descent trained Neural Networks
Approximation Results for Gradient Descent trained Neural Networks
G. Welper
207
1
0
09 Sep 2023
Implicit regularization of deep residual networks towards neural ODEs
Implicit regularization of deep residual networks towards neural ODEsInternational Conference on Learning Representations (ICLR), 2023
Pierre Marion
Yu-Han Wu
Michael E. Sander
Gérard Biau
525
23
0
03 Sep 2023
Deterministic equivalent of the Conjugate Kernel matrix associated to
  Artificial Neural Networks
Deterministic equivalent of the Conjugate Kernel matrix associated to Artificial Neural Networks
Clément Chouard
278
3
0
09 Jun 2023
How Spurious Features Are Memorized: Precise Analysis for Random and NTK
  Features
How Spurious Features Are Memorized: Precise Analysis for Random and NTK FeaturesInternational Conference on Machine Learning (ICML), 2023
Simone Bombari
Marco Mondelli
AAML
594
9
0
20 May 2023
On the effectiveness of neural priors in modeling dynamical systems
On the effectiveness of neural priors in modeling dynamical systems
Sameera Ramasinghe
Hemanth Saratchandran
Violetta Shevchenko
Simon Lucey
328
4
0
10 Mar 2023
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
Lu Xia
M. Hochstenbach
Stefano Massei
306
5
0
23 Jan 2023
Mechanistic Mode Connectivity
Mechanistic Mode ConnectivityInternational Conference on Machine Learning (ICML), 2022
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
370
57
0
15 Nov 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion
Characterizing the Spectrum of the NTK via a Power Series ExpansionInternational Conference on Learning Representations (ICLR), 2022
Michael Murray
Hui Jin
Benjamin Bowman
Guido Montúfar
496
20
0
15 Nov 2022
Overparameterized random feature regression with nearly orthogonal data
Overparameterized random feature regression with nearly orthogonal dataInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Zhichao Wang
Yizhe Zhu
446
9
0
11 Nov 2022
Finite Sample Identification of Wide Shallow Neural Networks with Biases
Finite Sample Identification of Wide Shallow Neural Networks with Biases
M. Fornasier
T. Klock
Marco Mondelli
Michael Rauchensteiner
297
7
0
08 Nov 2022
On skip connections and normalisation layers in deep optimisation
On skip connections and normalisation layers in deep optimisationNeural Information Processing Systems (NeurIPS), 2022
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
530
7
0
10 Oct 2022
Restricted Strong Convexity of Deep Learning Models with Smooth
  Activations
Restricted Strong Convexity of Deep Learning Models with Smooth ActivationsInternational Conference on Learning Representations (ICLR), 2022
A. Banerjee
Pedro Cisneros-Velarde
Libin Zhu
M. Belkin
345
11
0
29 Sep 2022
Magnitude and Angle Dynamics in Training Single ReLU Neurons
Magnitude and Angle Dynamics in Training Single ReLU NeuronsNeural Networks (NN), 2022
Sangmin Lee
Byeongsu Sim
Jong Chul Ye
MLT
415
6
0
27 Sep 2022
Approximation results for Gradient Descent trained Shallow Neural
  Networks in $1d$
Approximation results for Gradient Descent trained Shallow Neural Networks in 1d1d1d
R. Gentile
G. Welper
ODL
362
9
0
17 Sep 2022
Generalization Properties of NAS under Activation and Skip Connection
  Search
Generalization Properties of NAS under Activation and Skip Connection SearchNeural Information Processing Systems (NeurIPS), 2022
Zhenyu Zhu
Fanghui Liu
Grigorios G. Chrysos
Volkan Cevher
AI4CE
402
22
0
15 Sep 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep
  Models
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
573
275
0
13 Aug 2022
Spectral Bias Outside the Training Set for Deep Networks in the Kernel
  Regime
Spectral Bias Outside the Training Set for Deep Networks in the Kernel RegimeNeural Information Processing Systems (NeurIPS), 2022
Benjamin Bowman
Guido Montúfar
308
17
0
06 Jun 2022
Global Convergence of Over-parameterized Deep Equilibrium Models
Global Convergence of Over-parameterized Deep Equilibrium ModelsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Zenan Ling
Xingyu Xie
Qiuhao Wang
Zongpeng Zhang
Zhouchen Lin
340
16
0
27 May 2022
A Framework for Overparameterized Learning
A Framework for Overparameterized Learning
Dávid Terjék
Diego González-Sánchez
MLT
257
3
0
26 May 2022
Memorization and Optimization in Deep Neural Networks with Minimum
  Over-parameterization
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterizationNeural Information Processing Systems (NeurIPS), 2022
Simone Bombari
Mohammad Hossein Amani
Marco Mondelli
466
38
0
20 May 2022
Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with
  Linear Widths
Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths
Tianxiang Gao
Hongyang Gao
MLT
288
5
0
16 May 2022
Finite-Sum Optimization: A New Perspective for Convergence to a Global
  Solution
Finite-Sum Optimization: A New Perspective for Convergence to a Global Solution
Lam M. Nguyen
Trang H. Tran
Marten van Dijk
328
3
0
07 Feb 2022
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural
  Networks
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural NetworksInternational Conference on Learning Representations (ICLR), 2022
Benjamin Bowman
Guido Montúfar
268
13
0
12 Jan 2022
To Supervise or Not: How to Effectively Learn Wireless Interference
  Management Models?
To Supervise or Not: How to Effectively Learn Wireless Interference Management Models?
Bingqing Song
Haoran Sun
Wenqiang Pu
Sijia Liu
Min-Fong Hong
134
0
0
28 Dec 2021
Rethinking Influence Functions of Neural Networks in the
  Over-parameterized Regime
Rethinking Influence Functions of Neural Networks in the Over-parameterized Regime
Rui Zhang
Shihua Zhang
TDI
317
31
0
15 Dec 2021
SGD Through the Lens of Kolmogorov Complexity
SGD Through the Lens of Kolmogorov Complexity
Gregory Schwartzman
258
1
0
10 Nov 2021
12
Next
Page 1 of 2