Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

17 June 2021

Papers citing "Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity"

20 / 20 papers shown

Title
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations Yize Zhao Tina Behnia V. Vakilian Christos Thrampoulidis 55 8 0 20 Feb 2025
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries Chris Kolb T. Weber Bernd Bischl David Rügamer 100 0 0 04 Feb 2025
Optimization Insights into Deep Diagonal Linear Networks Hippolyte Labarrière C. Molinari Lorenzo Rosasco S. Villa Cristian Vega 66 0 0 21 Dec 2024
The Optimization Landscape of SGD Across the Feature Learning Strength Alexander B. Atanasov Alexandru Meterez James B. Simon C. Pehlevan 43 2 0 06 Oct 2024
Mask in the Mirror: Implicit Sparsification Tom Jacobs R. Burkholz 40 3 0 19 Aug 2024
Neural Redshift: Random Networks are not Random Functions Damien Teney A. Nicolicioiu Valentin Hartmann Ehsan Abbasnejad 89 18 0 04 Mar 2024
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks Yuan Cao Difan Zou Yuan-Fang Li Quanquan Gu MLT 24 5 0 20 Jun 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond Itai Kreisler Mor Shpigel Nacson Daniel Soudry Y. Carmon 23 13 0 22 May 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks Scott Pesme Nicolas Flammarion 17 35 0 02 Apr 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning Antonio Sclocchi Mario Geiger M. Wyart 16 6 0 31 Jan 2023
Infinite-width limit of deep linear neural networks Lénaïc Chizat Maria Colombo Xavier Fernández-Real Alessio Figalli 14 14 0 29 Nov 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 25 72 0 26 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent Zhiyuan Li Tianhao Wang Jason D. Lee Sanjeev Arora 25 27 0 08 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation Loucas Pillaud-Vivien J. Reygner Nicolas Flammarion NoLa 24 31 0 20 Jun 2022
Towards Understanding Sharpness-Aware Minimization Maksym Andriushchenko Nicolas Flammarion AAML 19 131 0 13 Jun 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba Murat A. Erdogdu Taiji Suzuki Zhichao Wang Denny Wu Greg Yang MLT 11 120 0 03 May 2022
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization I Zaghloul Amir Roi Livni Nathan Srebro 17 6 0 27 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks Noam Razin Asaf Maman Nadav Cohen 28 29 0 27 Jan 2022
A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval Fan Wu Patrick Rebeschini 25 14 0 20 Oct 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 273 2,878 0 15 Sep 2016