On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

24 May 2018

Papers citing "On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport"

50 / 161 papers shown

Title
Convergence of Time-Averaged Mean Field Gradient Descent Dynamics for Continuous Multi-Player Zero-Sum Games Yulong Lu Pierre Monmarché MLT 29 0 0 12 May 2025
Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime Francesco Camilli D. Tieplova Eleonora Bergamin Jean Barbier 106 0 0 06 May 2025
Ergodic Generative Flows Leo Maxime Brunswic Mateo Clemente Rui Heng Yang Adam Sigal Amir Rasouli Yinchuan Li 42 0 0 06 May 2025
Mirror Mean-Field Langevin Dynamics Anming Gu Juno Kim 31 0 0 05 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers Nolan Dey Bin Claire Zhang Lorenzo Noci Mufan Bill Li Blake Bordelon Shane Bergsma C. Pehlevan Boris Hanin Joel Hestness 39 0 0 02 May 2025
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime Raphael Barboni Gabriel Peyré François-Xavier Vialard MLT 34 0 0 25 Apr 2025
Statistically guided deep learning Michael Kohler A. Krzyżak ODL BDL 68 0 0 11 Apr 2025
Fractal and Regular Geometry of Deep Neural Networks Simmaco Di Lillo Domenico Marinucci Michele Salvi S. Vigogna MDE AI4CE 31 0 0 08 Apr 2025
DDEQs: Distributional Deep Equilibrium Models through Wasserstein Gradient Flows Jonathan Geuter Clément Bonet Anna Korba David Alvarez-Melis 56 0 0 03 Mar 2025
Geometry and Optimization of Shallow Polynomial Networks Yossi Arjevani Joan Bruna Joe Kileel Elzbieta Polak Matthew Trager 34 1 0 10 Jan 2025
Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input Ziang Chen Rong Ge MLT 59 1 0 10 Jan 2025
Non-geodesically-convex optimization in the Wasserstein space Hoang Phuc Hau Luu Hanlin Yu Bernardo Williams Petrus Mikkola Marcelo Hartmann Kai Puolamaki Arto Klami 53 2 0 08 Jan 2025
Emergence of meta-stable clustering in mean-field transformer models Giuseppe Bruno Federico Pasqualotto Andrea Agazzi 45 6 0 30 Oct 2024
Robust Feature Learning for Multi-Index Models in High Dimensions Alireza Mousavi-Hosseini Adel Javanmard Murat A. Erdogdu OOD AAML 42 1 0 21 Oct 2024
Extended convexity and smoothness and their applications in deep learning Binchuan Qi Wei Gong Li Li 61 0 0 08 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength Alexander B. Atanasov Alexandru Meterez James B. Simon C. Pehlevan 43 2 0 06 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks Clémentine Dominé Nicolas Anguita A. Proca Lukas Braun D. Kunin P. Mediano Andrew M. Saxe 32 3 0 22 Sep 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics Alireza Mousavi-Hosseini Denny Wu Murat A. Erdogdu MLT AI4CE 27 6 0 14 Aug 2024
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning Arthur Jacot Seok Hoan Choi Yuxiao Wen AI4CE 88 2 0 08 Jul 2024
Symmetries in Overparametrized Neural Networks: A Mean-Field View Javier Maass Martínez Joaquin Fontbona FedML MLT 38 2 0 30 May 2024
Infinite Limits of Multi-head Transformer Dynamics Blake Bordelon Hamza Tahir Chaudhry C. Pehlevan AI4CE 42 9 0 24 May 2024
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions Luca Arnaboldi Yatin Dandi Florent Krzakala Luca Pesce Ludovic Stephan 61 12 0 24 May 2024
Convergence analysis of controlled particle systems arising in deep learning: from finite to infinite sample size Huafu Liao Alpár R. Mészáros Chenchen Mou Chao Zhou 26 2 0 08 Apr 2024
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport Raphael Barboni Gabriel Peyré Franccois-Xavier Vialard 32 3 0 19 Mar 2024
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations Akshay Kumar Jarvis D. Haupt ODL 44 3 0 12 Mar 2024
Mean-field underdamped Langevin dynamics and its spacetime discretization Qiang Fu Ashia Wilson 34 4 0 26 Dec 2023
Learning a Sparse Representation of Barron Functions with the Inverse Scale Space Flow T. J. Heeringa Tim Roith Christoph Brune Martin Burger 11 0 0 05 Dec 2023
Accelerating optimization over the space of probability measures Shi Chen Wenxuan Wu Yuhang Yao Stephen J. Wright 26 4 0 06 Oct 2023
Beyond Log-Concavity: Theory and Algorithm for Sum-Log-Concave Optimization Mastane Achab 20 1 0 26 Sep 2023
Gradient-Based Feature Learning under Structured Data Alireza Mousavi-Hosseini Denny Wu Taiji Suzuki Murat A. Erdogdu MLT 32 18 0 07 Sep 2023
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences Samuel Chun-Hei Lam Justin A. Sirignano K. Spiliopoulos 24 2 0 28 Aug 2023
Nonlinear Hamiltonian Monte Carlo & its Particle Approximation Nawaf Bou-Rabee Katharina Schuh 23 7 0 22 Aug 2023
Quantitative CLTs in Deep Neural Networks Stefano Favaro Boris Hanin Domenico Marinucci I. Nourdin G. Peccati BDL 23 11 0 12 Jul 2023
Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference Arnaud Descours Tom Huix Arnaud Guillin Manon Michel Eric Moulines Boris Nectoux BDL 29 1 0 10 Jul 2023
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions Nishil Patel Sebastian Lee Stefano Sarao Mannelli Sebastian Goldt Adrew Saxe OffRL 25 3 0 17 Jun 2023
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks Puyu Wang Yunwen Lei Di Wang Yiming Ying Ding-Xuan Zhou MLT 27 3 0 26 May 2023
Understanding the Initial Condensation of Convolutional Neural Networks Zhangchen Zhou Hanxu Zhou Yuqing Li Zhi-Qin John Xu MLT AI4CE 23 5 0 17 May 2023
Performative Prediction with Neural Networks Mehrnaz Mofakhami Ioannis Mitliagkas Gauthier Gidel 40 16 0 14 Apr 2023
Full Gradient Deep Reinforcement Learning for Average-Reward Criterion Tejas Pagare Vivek Borkar Konstantin Avrachenkov 24 4 0 07 Apr 2023
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks Blake Bordelon C. Pehlevan MLT 38 29 0 06 Apr 2023
High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance Krishnakumar Balasubramanian Promit Ghosal Ye He 28 5 0 03 Apr 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality François Ged M. H. Veiga 21 0 0 22 Mar 2023
Global Optimality of Elman-type RNN in the Mean-Field Regime Andrea Agazzi Jian-Xiong Lu Sayan Mukherjee MLT 26 1 0 12 Mar 2023
Phase Diagram of Initial Condensation for Two-layer Neural Networks Zheng Chen Yuqing Li Tao Luo Zhaoguang Zhou Z. Xu MLT AI4CE 43 8 0 12 Mar 2023
Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems Atsushi Nitanda Kazusato Oko Denny Wu Nobuhito Takenouchi Taiji Suzuki 24 3 0 06 Mar 2023
Learning time-scales in two-layers neural networks Raphael Berthier Andrea Montanari Kangjie Zhou 36 33 0 28 Feb 2023
Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance Yifan Chen Daniel Zhengyu Huang Jiaoyang Huang Sebastian Reich Andrew M. Stuart 11 17 0 21 Feb 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron Weihang Xu S. Du 29 16 0 20 Feb 2023
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization Agustinus Kristiadi Felix Dangel Philipp Hennig 26 11 0 14 Feb 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent Benjamin Gess Sebastian Kassing Vitalii Konarovskyi DiffM 26 6 0 14 Feb 2023