Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.09524
Cited By
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
17 June 2021
Scott Pesme
Loucas Pillaud-Vivien
Nicolas Flammarion
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity"
20 / 20 papers shown
Title
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao
Tina Behnia
V. Vakilian
Christos Thrampoulidis
55
8
0
20 Feb 2025
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
Chris Kolb
T. Weber
Bernd Bischl
David Rügamer
100
0
0
04 Feb 2025
Optimization Insights into Deep Diagonal Linear Networks
Hippolyte Labarrière
C. Molinari
Lorenzo Rosasco
S. Villa
Cristian Vega
66
0
0
21 Dec 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
C. Pehlevan
43
2
0
06 Oct 2024
Mask in the Mirror: Implicit Sparsification
Tom Jacobs
R. Burkholz
40
3
0
19 Aug 2024
Neural Redshift: Random Networks are not Random Functions
Damien Teney
A. Nicolicioiu
Valentin Hartmann
Ehsan Abbasnejad
89
18
0
04 Mar 2024
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
Yuan Cao
Difan Zou
Yuan-Fang Li
Quanquan Gu
MLT
24
5
0
20 Jun 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
23
13
0
22 May 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Nicolas Flammarion
17
35
0
02 Apr 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
M. Wyart
16
6
0
31 Jan 2023
Infinite-width limit of deep linear neural networks
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
14
14
0
29 Nov 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
25
72
0
26 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
25
27
0
08 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
24
31
0
20 Jun 2022
Towards Understanding Sharpness-Aware Minimization
Maksym Andriushchenko
Nicolas Flammarion
AAML
19
131
0
13 Jun 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
11
120
0
03 May 2022
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization
I Zaghloul Amir
Roi Livni
Nathan Srebro
17
6
0
27 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
28
29
0
27 Jan 2022
A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval
Fan Wu
Patrick Rebeschini
25
14
0
20 Oct 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,878
0
15 Sep 2016
1