v1v2v3 (latest)

Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations

12 March 2024

Papers citing "Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations"

36 / 36 papers shown

Title
From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics Zheng-an Chen Tao Luo AI4CE 128 1 0 08 Oct 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks D. Kunin Giovanni Luca Marchetti F. Chen Dhruva Karkada James B. Simon M. DeWeese Surya Ganguli Nina Miolane 385 4 0 06 Jun 2025
An overview of condensation phenomenon in deep learning Zhi-Qin John Xu Yaoyu Zhang Zhangchen Zhou AI4CE 214 11 0 13 Apr 2025
Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks Ke Chen Chugang Yi Haizhao Yang MLT 151 2 0 03 Oct 2024
Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks Akshay Kumar Jarvis Haupt ODL 185 10 0 14 Feb 2024
Early Neuron Alignment in Two-layer ReLU Networks with Small InitializationInternational Conference on Learning Representations (ICLR), 2023 Hancheng Min Enrique Mallada René Vidal MLT 242 28 0 24 Jul 2023
Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU NetworksNeural Information Processing Systems (NeurIPS), 2023 Mingze Wang Chao Ma 134 16 0 21 May 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix SensingInternational Conference on Machine Learning (ICML), 2023 Jikai Jin Zhiyuan Li Kaifeng Lyu S. Du Jason D. Lee MLT 258 44 0 27 Jan 2023
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputsNeural Information Processing Systems (NeurIPS), 2022 Etienne Boursier Loucas Pillaud-Vivien Nicolas Flammarion ODL 260 73 0 02 Jun 2022
Empirical Phase Diagram for Three-layer Neural Networks with Infinite WidthNeural Information Processing Systems (NeurIPS), 2022 Hanxu Zhou Qixuan Zhou Zhenyuan Jin Yaoyu Zhang Yaoyu Zhang Zhi-Qin John Xu 220 22 0 24 May 2022
Neural Networks as Kernel Learners: The Silent Alignment EffectInternational Conference on Learning Representations (ICLR), 2021 Alexander B. Atanasov Blake Bordelon Cengiz Pehlevan MLT 362 94 0 29 Oct 2021
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias Kaifeng Lyu Zhiyuan Li Runzhe Wang Sanjeev Arora MLT 228 81 0 26 Oct 2021
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity Arthur Jacot François Ged Berfin cSimcsek Clément Hongler Franck Gabriel 312 65 0 30 Jun 2021
Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstructionNeural Information Processing Systems (NeurIPS), 2021 Dominik Stöger Mahdi Soltanolkotabi ODL 349 86 0 28 Jun 2021
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank LearningInternational Conference on Learning Representations (ICLR), 2020 Zhiyuan Li Yuping Luo Kaifeng Lyu 201 143 0 17 Dec 2020
Phase diagram for two-layer ReLU neural networks at infinite-width limitJournal of machine learning research (JMLR), 2020 Yaoyu Zhang Zhi-Qin John Xu Zheng Ma Yaoyu Zhang 198 71 0 15 Jul 2020
Directional convergence and alignment in deep learningNeural Information Processing Systems (NeurIPS), 2020 Ziwei Ji Matus Telgarsky 228 198 0 11 Jun 2020
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic LossAnnual Conference Computational Learning Theory (COLT), 2020 Lénaïc Chizat Francis R. Bach MLT 582 364 0 11 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning LibraryNeural Information Processing Systems (NeurIPS), 2019 Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury ... Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai Soumith Chintala ODL 956 48,276 0 03 Dec 2019
Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learningMathematical programming (Math. Program.), 2019 Jérôme Bolte Edouard Pauwels 433 152 0 23 Sep 2019
Gradient Descent Maximizes the Margin of Homogeneous Neural NetworksInternational Conference on Learning Representations (ICLR), 2019 Kaifeng Lyu Jian Li 482 363 0 13 Jun 2019
Kernel and Rich Regimes in Overparametrized ModelsAnnual Conference Computational Learning Theory (COLT), 2019 Blake E. Woodworth Suriya Gunasekar Pedro H. P. Savarese E. Moroshko Itay Golan Jason D. Lee Daniel Soudry Nathan Srebro 323 390 0 13 Jun 2019
Implicit Regularization in Deep Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2019 Sanjeev Arora Nadav Cohen Wei Hu Yuping Luo AI4CE 352 557 0 31 May 2019
Approximation spaces of deep neural networksConstructive approximation (Constr. Approx.), 2019 Rémi Gribonval Gitta Kutyniok M. Nielsen Felix Voigtländer 198 137 0 03 May 2019
On Exact Computation with an Infinitely Wide Neural Net Sanjeev Arora S. Du Wei Hu Zhiyuan Li Ruslan Salakhutdinov Ruosong Wang 597 985 0 26 Apr 2019
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit Song Mei Theodor Misiakiewicz Andrea Montanari MLT 303 300 0 16 Feb 2019
On Lazy Training in Differentiable Programming Lénaïc Chizat Edouard Oyallon Francis R. Bach 514 903 0 19 Dec 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 1.8K 3,638 0 20 Jun 2018
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport Lénaïc Chizat Francis R. Bach OT 383 794 0 24 May 2018
Gradient Descent Quantizes ReLU Network Features Hartmut Maennel Olivier Bousquet Sylvain Gelly MLT 134 88 0 22 Mar 2018
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks Mahdi Soltanolkotabi Adel Javanmard Jason D. Lee 499 435 0 16 Jul 2017
Recovery Guarantees for One-hidden-layer Neural NetworksInternational Conference on Machine Learning (ICML), 2017 Kai Zhong Zhao Song Prateek Jain Peter L. Bartlett Inderjit S. Dhillon MLT 345 344 0 10 Jun 2017
Implicit Regularization in Matrix Factorization Suriya Gunasekar Blake E. Woodworth Srinadh Bhojanapalli Behnam Neyshabur Nathan Srebro 258 527 0 25 May 2017
$Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with $ \ell^1 $ and $ \ell^0 $ Controls$ Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with $ \ell^1 $ and $ \ell^0 $ Controls Jason M. Klusowski Andrew R. Barron 500 164 0 26 Jul 2016
On the Computational Efficiency of Training Neural NetworksNeural Information Processing Systems (NeurIPS), 2014 Roi Livni Shai Shalev-Shwartz Ohad Shamir 364 490 0 05 Oct 2014
Exact solutions to the nonlinear dynamics of learning in deep linear neural networksInternational Conference on Learning Representations (ICLR), 2013 Andrew M. Saxe James L. McClelland Surya Ganguli ODL 1.0K 1,964 0 20 Dec 2013