v1v2v3 (latest)

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

4 October 2018

Papers citing "A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks"

50 / 209 papers shown

Title
Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descentJournal of Statistical Planning and Inference (JSPI), 2024 Michael Kohler A. Krzyżak Benjamin Walter 195 1 0 13 May 2024
Masks, Signs, And Learning Rate Rewinding Advait Gadhikar R. Burkholz 204 14 0 29 Feb 2024
Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion Shuqi Ke Charlie Hou Giulia Fanti Sewoong Oh 187 5 0 29 Feb 2024
Sobolev Training for Operator Learning Namkyeong Cho Junseung Ryu Hyung Ju Hwang 95 1 0 14 Feb 2024
Estimating the Local Learning Coefficient at Scale Zach Furman Edmund Lau 162 4 0 06 Feb 2024
Linear Recursive Feature Machines provably recover low-rank matricesProceedings of the National Academy of Sciences of the United States of America (PNAS), 2024 Adityanarayanan Radhakrishnan Misha Belkin Dmitriy Drusvyatskiy 244 12 0 09 Jan 2024
Understanding Unimodal Bias in Multimodal Deep Linear NetworksInternational Conference on Machine Learning (ICML), 2023 Yedi Zhang Peter E. Latham Andrew Saxe 220 11 0 01 Dec 2023
Convergence Analysis for Learning Orthonormal Deep Linear Neural NetworksIEEE Signal Processing Letters (IEEE SPL), 2023 Zhen Qin Xuwei Tan Zhihui Zhu 275 1 0 24 Nov 2023
Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization Selina Drews Michael Kohler 153 4 0 24 Nov 2023
Fast-ELECTRA for Efficient Pre-trainingInternational Conference on Learning Representations (ICLR), 2023 Chengyu Dong Liyuan Liu Hao Cheng Jingbo Shang Jianfeng Gao Xiaodong Liu 188 2 0 11 Oct 2023
Are GATs Out of Balance?Neural Information Processing Systems (NeurIPS), 2023 Nimrah Mustafa Aleksandar Bojchevski R. Burkholz 294 8 0 11 Oct 2023
Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition Zhongtian Chen Edmund Lau Jake Mendel Susan Wei Daniel Murfet 118 21 0 10 Oct 2023
Towards Training Without Depth Limits: Batch Normalization Without Gradient ExplosionInternational Conference on Learning Representations (ICLR), 2023 Alexandru Meterez Amir Joudaki Francesco Orabona Alexander Immer Gunnar Rätsch Hadi Daneshmand 173 8 0 03 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and AttentionInternational Conference on Learning Representations (ICLR), 2023 Yuandong Tian Yiping Wang Zhenyu Zhang Beidi Chen Simon Shaolei Du 266 45 0 01 Oct 2023
Early Neuron Alignment in Two-layer ReLU Networks with Small InitializationInternational Conference on Learning Representations (ICLR), 2023 Hancheng Min Enrique Mallada René Vidal MLT 218 26 0 24 Jul 2023
Abide by the Law and Follow the Flow: Conservation Laws for Gradient FlowsNeural Information Processing Systems (NeurIPS), 2023 Sibylle Marcotte Rémi Gribonval Gabriel Peyré 289 27 0 30 Jun 2023
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage Yu Gui Cong Ma Yiqiao Zhong 165 8 0 06 Jun 2023
The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks Can Yaras Peng Wang Wei Hu Zhihui Zhu Laura Balzano Qing Qu 253 19 0 01 Jun 2023
Pruning at Initialization -- A Sketching PerspectiveIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023 Noga Bar Raja Giryes 226 1 0 27 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer TransformerNeural Information Processing Systems (NeurIPS), 2023 Yuandong Tian Yiping Wang Beidi Chen S. Du MLT 368 96 0 25 May 2023
$Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps from high to low rank$ Implicit bias of SGD in $L_{2}$ -regularized linear DNNs: One-way jumps from high to low rankInternational Conference on Learning Representations (ICLR), 2023 Zihan Wang Arthur Jacot 216 23 0 25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and BeyondInternational Conference on Machine Learning (ICML), 2023 Itai Kreisler Mor Shpigel Nacson Daniel Soudry Y. Carmon 183 16 0 22 May 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein LossInternational Conference on Machine Learning (ICML), 2023 Pierre Bréchet Katerina Papagiannouli Jing An Guido Montúfar 338 7 0 06 Mar 2023
Bayesian Interpolation with Deep Linear NetworksProceedings of the National Academy of Sciences of the United States of America (PNAS), 2022 Boris Hanin Alexander Zlokapa 333 28 0 29 Dec 2022
Effects of Data Geometry in Early Deep LearningNeural Information Processing Systems (NeurIPS), 2022 Saket Tiwari George Konidaris 297 7 0 29 Dec 2022
A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization JIAN-PENG Cao Chao Qian Yihui Huang Dicheng Chen Yuncheng Gao Jiyang Dong D. Guo X. Qu 301 1 0 29 Dec 2022
Asymptotic Analysis of Deep Residual Networks R. Cont Alain Rossier Renyuan Xu 137 4 0 15 Dec 2022
Infinite-width limit of deep linear neural networksCommunications on Pure and Applied Mathematics (CPAM), 2022 Lénaïc Chizat Maria Colombo Xavier Fernández-Real Alessio Figalli 162 21 0 29 Nov 2022
Finite Sample Identification of Wide Shallow Neural Networks with Biases M. Fornasier T. Klock Marco Mondelli Michael Rauchensteiner 157 7 0 08 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flowInternational Conference on Learning Representations (ICLR), 2022 Bo Zhao I. Ganev Robin Walters Rose Yu Nima Dehmamy 308 26 0 31 Oct 2022
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural NetsInternational Conference on Learning Representations (ICLR), 2022 Edo Cohen-Karlik Itamar Menuhin-Gruman Raja Giryes Nadav Cohen Amir Globerson 299 7 0 25 Oct 2022
Deep Linear Networks for Matrix Completion -- An Infinite Depth LimitSIAM Journal on Applied Dynamical Systems (SIADS), 2022 Nadav Cohen Govind Menon Zsolt Veraszto ODL 155 11 0 22 Oct 2022
TiDAL: Learning Training Dynamics for Active LearningIEEE International Conference on Computer Vision (ICCV), 2022 Seong Min Kye Kwanghee Choi Hyeongmin Byun Buru Chang 335 20 0 13 Oct 2022
On skip connections and normalisation layers in deep optimisationNeural Information Processing Systems (NeurIPS), 2022 L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 337 4 0 10 Oct 2022
Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function DecompositionInternational Conference on Learning Representations (ICLR), 2022 Jianhao Ma Li-Zhen Guo Salar Fattahi 308 4 0 01 Oct 2022
Implicit Full Waveform Inversion with Deep Neural Representation Jian Sun K. Innanen AI4CE 155 55 0 08 Sep 2022
Intersection of Parallels as an Early Stopping CriterionInternational Conference on Information and Knowledge Management (CIKM), 2022 Ali Vardasbi Maarten de Rijke Mostafa Dehghani MoMe 123 7 0 19 Aug 2022
The Neural Race Reduction: Dynamics of Abstraction in Gated NetworksInternational Conference on Machine Learning (ICML), 2022 Andrew M. Saxe Shagun Sodhani Sam Lewallen AI4CE 166 43 0 21 Jul 2022
A note on Linear Bottleneck networks and their Transition to Multilinearity Libin Zhu Parthe Pandit M. Belkin MLT 151 0 0 30 Jun 2022
Analysis of Branch Specialization and its Application in Image Decomposition Jonathan Brokman Guy Gilboa 98 2 0 12 Jun 2022
Explicit Regularization in Overparametrized Models via Noise InjectionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 Antonio Orvieto Anant Raj Hans Kersting Francis R. Bach 124 33 0 09 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization Vignesh Kothapalli 331 101 0 08 Jun 2022
Understanding the Role of Nonlinearity in Training Dynamics of Contrastive LearningInternational Conference on Learning Representations (ICLR), 2022 Yuandong Tian MLT 256 17 0 02 Jun 2022
Blind Estimation of a Doubly Selective OFDM Channel: A Deep Learning Algorithm and Theory T. Getu N. Golmie D. Griffith 156 2 0 30 May 2022
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural NetworksNeural Information Processing Systems (NeurIPS), 2022 Blake Bordelon Cengiz Pehlevan MLT 263 108 0 19 May 2022
A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural NetworksInformation Sciences (Inf. Sci.), 2022 Xin Liu Wei Tao Zhisong Pan 80 11 0 18 Apr 2022
Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual NetworksSocial Science Research Network (SSRN), 2022 R. Cont Alain Rossier Renyuan Xu MLT 285 6 0 14 Apr 2022
Concept Evolution in Deep Learning Training: A Unified Interpretation Framework and DiscoveriesInternational Conference on Information and Knowledge Management (CIKM), 2022 Haekyu Park Seongmin Lee Benjamin Hoover Austin P. Wright Omar Shaikh Rahul Duggal Nilaksh Das Kevin Wenliang Li Judy Hoffman Duen Horng Chau 240 3 0 30 Mar 2022
Convergence of gradient descent for deep neural networks S. Chatterjee ODL 228 27 0 30 Mar 2022
On the Implicit Bias of Gradient Descent for Temporal ExtrapolationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 Edo Cohen-Karlik Avichai Ben David Nadav Cohen Amir Globerson 140 4 0 09 Feb 2022