ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.02281
  4. Cited By
A Convergence Analysis of Gradient Descent for Deep Linear Neural
  Networks
v1v2v3 (latest)

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

4 October 2018
Sanjeev Arora
Nadav Cohen
Noah Golowich
Wei Hu
ArXiv (abs)PDFHTML

Papers citing "A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks"

50 / 209 papers shown
Title
TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting
TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting
Lingyu Jiang
Lingyu Xu
Peiran Li
Qianwen Ge
Dingyi Zhuang
...
Ziming Zhang
Zhengzhong Tu
Michael R. Zielewski
Kazunori D Yamada
Fangzhou Lin
AI4TS
126
0
0
23 Nov 2025
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
Hancheng Min
Zhihui Zhu
Rene Vidal
99
0
0
24 Oct 2025
TreeNet: Layered Decision Ensembles
TreeNet: Layered Decision Ensembles
Zeshan Khan
70
0
0
07 Oct 2025
Learning Regularization Functionals for Inverse Problems: A Comparative Study
Learning Regularization Functionals for Inverse Problems: A Comparative Study
J. Hertrich
Matthias Joachim Ehrhardt
Alexander Denker
Stanislas Ducotterd
Zhenghan Fang
...
German Shâma Wache
Martin Zach
Yasi Zhang
Matthias Joachim Ehrhardt
Sebastian Neumayer
108
3
0
02 Oct 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang
Guido Montúfar
163
0
0
29 Sep 2025
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Tianyi Chen
Pengxiao Lin
Zhiwei Wang
Zhi-Qin John Xu
Mamba
118
0
0
22 Sep 2025
GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping
GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping
Qifu Wen
Xi Zeng
Zihan Zhou
Shuaijun Liu
M. Hosseinzadeh
Ningxin Su
Reza Rawassizadeh
211
0
0
01 Sep 2025
On Task Vectors and Gradients
On Task Vectors and Gradients
Luca Zhou
Daniele Solombrino
Donato Crisostomi
Maria Sofia Bucarelli
Giuseppe Alessio D’Inverno
Fabrizio Silvestri
Emanuele Rodolà
MoMe
321
1
0
22 Aug 2025
Wormhole Dynamics in Deep Neural Networks
Wormhole Dynamics in Deep Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025
Yen-Lung Lai
Zhe Jin
AI4CE
108
1
0
20 Aug 2025
Learning with Confidence
Learning with ConfidenceConference on Uncertainty in Artificial Intelligence (UAI), 2025
Oliver Ethan Richardson
44
0
0
14 Aug 2025
Intrinsic training dynamics of deep neural networks
Intrinsic training dynamics of deep neural networks
Sibylle Marcotte
Gabriel Peyré
Rémi Gribonval
AI4CE
76
1
0
10 Aug 2025
Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization
Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision QuantizationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Haidong Kang
Lianbo Ma
Guo-Ding Yu
Shangce Gao
MQ
180
1
0
05 Aug 2025
The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
Enric Boix-Adserà
Neil Rohit Mallinar
James B. Simon
M. Belkin
MLT
189
1
0
08 Jul 2025
Understanding Learning Invariance in Deep Linear Networks
Understanding Learning Invariance in Deep Linear Networks
Hao Duan
Guido Montúfar
208
0
0
16 Jun 2025
Symmetry in Neural Network Parameter Spaces
Symmetry in Neural Network Parameter Spaces
Bo Zhao
Robin Walters
Rose Yu
297
6
0
16 Jun 2025
Information-Theoretic Framework for Understanding Modern Machine-Learning
Information-Theoretic Framework for Understanding Modern Machine-Learning
M. Feder
Ruediger Urbanke
Yaniv Fogel
151
0
0
09 Jun 2025
Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
Geonhui Yoo
Minhak Song
Chulhee Yun
FAtt
155
0
0
07 Jun 2025
Transformative or Conservative? Conservation laws for ResNets and Transformers
Transformative or Conservative? Conservation laws for ResNets and Transformers
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
208
3
0
06 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion
Liang Zhang
Bingcong Li
Niao He
188
3
0
03 Jun 2025
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Yilang Zhang
Bingcong Li
G. Giannakis
518
2
0
24 May 2025
Backward Oversmoothing: why is it hard to train deep Graph Neural Networks?
Backward Oversmoothing: why is it hard to train deep Graph Neural Networks?
Nicolas Keriven
158
0
0
22 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
229
2
0
16 May 2025
Towards Efficient Training of Graph Neural Networks: A Multiscale Approach
Towards Efficient Training of Graph Neural Networks: A Multiscale Approach
Eshed Gal
Moshe Eliasof
Carola-Bibiane Schönlieb
Eldad Haber
E. Haber
Eran Treister
GNNAI4CE
414
3
0
25 Mar 2025
Parameter Expanded Stochastic Gradient Markov Chain Monte CarloInternational Conference on Learning Representations (ICLR), 2025
Hyunsu Kim
G. Nam
Chulhee Yun
Hongseok Yang
Juho Lee
BDLUQCV
205
0
0
02 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
480
1
0
28 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
263
20
0
27 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
374
2
0
15 Jan 2025
Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data
Alec S. Xu
Can Yaras
Peng Wang
Q. Qu
258
2
0
04 Jan 2025
Offline Stochastic Optimization of Black-Box Objective Functions
Offline Stochastic Optimization of Black-Box Objective Functions
Juncheng Dong
Zihao Wu
Hamid Jafarkhani
Ali Pezeshki
Vahid Tarokh
OffRL
241
0
0
03 Dec 2024
ExpTest: Automating Learning Rate Searching and Tuning with Insights
  from Linearized Neural Networks
ExpTest: Automating Learning Rate Searching and Tuning with Insights from Linearized Neural Networks
Zan Chaudhry
Naoko Mizuno
259
0
0
25 Nov 2024
A new Input Convex Neural Network with application to options pricing
A new Input Convex Neural Network with application to options pricing
Vincent Lemaire
Gilles Pagès
Christian Yeo
285
0
0
19 Nov 2024
Multi-layer matrix factorization for cancer subtyping using full and
  partial multi-omics dataset
Multi-layer matrix factorization for cancer subtyping using full and partial multi-omics dataset
Yingxuan Ren
Fengtao Ren
Bo Yang
128
2
0
18 Nov 2024
How to Defend Against Large-scale Model Poisoning Attacks in Federated
  Learning: A Vertical Solution
How to Defend Against Large-scale Model Poisoning Attacks in Federated Learning: A Vertical Solution
Jinbo Wang
Ruijin Wang
Fengli Zhang
FedMLAAML
211
0
0
16 Nov 2024
The Persistence of Neural Collapse Despite Low-Rank Bias
The Persistence of Neural Collapse Despite Low-Rank Bias
Connall Garrod
Jonathan P. Keating
261
5
0
30 Oct 2024
Plastic Learning with Deep Fourier Features
Plastic Learning with Deep Fourier FeaturesInternational Conference on Learning Representations (ICLR), 2024
Alex Lewandowski
Dale Schuurmans
Marlos C. Machado
CLL
225
8
0
27 Oct 2024
On the Crucial Role of Initialization for Matrix Factorization
On the Crucial Role of Initialization for Matrix FactorizationInternational Conference on Learning Representations (ICLR), 2024
Bingcong Li
Liang Zhang
Aryan Mokhtari
Niao He
379
10
0
24 Oct 2024
Implicit Regularization of Sharpness-Aware Minimization for
  Scale-Invariant Problems
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant ProblemsNeural Information Processing Systems (NeurIPS), 2024
Bingcong Li
Liang Zhang
Niao He
232
9
0
18 Oct 2024
Towards Sharper Risk Bounds for Minimax Problems
Towards Sharper Risk Bounds for Minimax ProblemsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Bowei Zhu
Shaojie Li
Yong Liu
199
0
0
11 Oct 2024
Swing-by Dynamics in Concept Learning and Compositional Generalization
Swing-by Dynamics in Concept Learning and Compositional GeneralizationInternational Conference on Learning Representations (ICLR), 2024
Yongyi Yang
Core Francisco Park
Ekdeep Singh Lubana
Maya Okawa
Wei Hu
Hidenori Tanaka
CoGeDiffM
250
0
0
10 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined
  Local Learning Coefficient
Differentiation and Specialization of Attention Heads via the Refined Local Learning CoefficientInternational Conference on Learning Representations (ICLR), 2024
George Wang
Jesse Hoogland
Stan van Wingerden
Zach Furman
Daniel Murfet
OffRL
192
20
0
03 Oct 2024
Unifying back-propagation and forward-forward algorithms through model
  predictive control
Unifying back-propagation and forward-forward algorithms through model predictive control
Lianhai Ren
Qianxiao Li
214
2
0
29 Sep 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
From Lazy to Rich: Exact Learning Dynamics in Deep Linear NetworksInternational Conference on Learning Representations (ICLR), 2024
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
333
18
0
22 Sep 2024
Lecture Notes on Linear Neural Networks: A Tale of Optimization and
  Generalization in Deep Learning
Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
Nadav Cohen
Noam Razin
249
2
0
25 Aug 2024
Beyond Over-smoothing: Uncovering the Trainability Challenges in Deep
  Graph Neural Networks
Beyond Over-smoothing: Uncovering the Trainability Challenges in Deep Graph Neural NetworksInternational Conference on Information and Knowledge Management (CIKM), 2024
Jie Peng
Runlin Lei
Zhewei Wei
302
12
0
07 Aug 2024
Spring-block theory of feature learning in deep neural networks
Spring-block theory of feature learning in deep neural networks
Chengzhi Shi
Liming Pan
Ivan Dokmanić
AI4CE
447
1
0
28 Jul 2024
How Neural Networks Learn the Support is an Implicit Regularization
  Effect of SGD
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
200
2
0
17 Jun 2024
Compressible Dynamics in Deep Overparameterized Low-Rank Learning &
  Adaptation
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation
Can Yaras
Peng Wang
Laura Balzano
Qing Qu
AI4CE
186
21
0
06 Jun 2024
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Zhenfeng Tu
Santiago Aranguri
Arthur Jacot
191
13
0
27 May 2024
Deep linear networks for regression are implicitly regularized towards
  flat minima
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
266
11
0
22 May 2024
Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
155
5
0
21 May 2024
12345
Next