Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1806.00900
Cited By
v1
v2 (latest)
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
4 June 2018
S. Du
Wei Hu
Jason D. Lee
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced"
50 / 125 papers shown
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
Min Gan
Guang-yong Chen
Yang Yi
Lin Yang
85
0
0
03 Nov 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang
Guido Montúfar
219
0
0
29 Sep 2025
Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models
Tianxiao Cao
Kyohei Atarashi
H. Kashima
227
0
0
14 Aug 2025
Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond
Jiaxin Deng
Qingcheng Zhu
Junbiao Pang
Linlin Yang
Zhongqian Fu
Baochang Zhang
150
0
0
01 Aug 2025
Symmetry in Neural Network Parameter Spaces
Bo Zhao
Robin Walters
Rose Yu
366
8
0
16 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
417
4
0
06 Jun 2025
Transformative or Conservative? Conservation laws for ResNets and Transformers
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
251
3
0
06 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion
Liang Zhang
Bingcong Li
Niao He
256
3
0
03 Jun 2025
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Yilang Zhang
Bingcong Li
G. Giannakis
613
2
0
24 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
270
2
0
16 May 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
303
1
0
04 Mar 2025
Low-rank bias, weight decay, and model merging in neural networks
Ilja Kuzborskij
Yasin Abbasi-Yadkori
353
1
0
24 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks
Annual Conference Computational Learning Theory (COLT), 2025
Sholom Schechtman
Nicolas Schreuder
1.1K
0
0
08 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
638
6
0
01 Feb 2025
k
k
k
-SVD with Gradient Descent
Yassir Jedra
Yassir Jedra
452
0
0
01 Feb 2025
Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry
Giovanni Luca Marchetti
Vahid Shahverdi
Stefano Mereta
Matthew Trager
Kathlén Kohn
344
2
0
31 Jan 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
308
22
0
27 Jan 2025
Geometry and Optimization of Shallow Polynomial Networks
Yossi Arjevani
Joan Bruna
Joe Kileel
Elzbieta Polak
Matthew Trager
296
5
0
10 Jan 2025
On subdifferential chain rule of matrix factorization and beyond
Jiewen Guan
Anthony Man-Cho So
AI4CE
258
1
0
07 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
International Conference on Learning Representations (ICLR), 2024
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
477
35
0
26 Sep 2024
In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting
AAAI Conference on Artificial Intelligence (AAAI), 2024
Constantin Philippenko
Kevin Scaman
Laurent Massoulié
FedML
358
3
0
13 Sep 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
327
12
0
21 Aug 2024
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
Neural Information Processing Systems (NeurIPS), 2024
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
338
25
0
10 Jun 2024
Masks, Signs, And Learning Rate Rewinding
Advait Gadhikar
R. Burkholz
229
14
0
29 Feb 2024
Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space
Mingyang Yi
Bohan Wang
336
0
0
24 Jan 2024
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang
Zhenghao Xu
Tuo Zhao
Molei Tao
316
16
0
26 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
International Conference on Learning Representations (ICLR), 2023
Nuoya Xiong
Lijun Ding
Simon S. Du
471
18
0
03 Oct 2023
Deep Neural Networks Tend To Extrapolate Predictably
International Conference on Learning Representations (ICLR), 2023
Katie Kang
Amrith Rajagopal Setlur
Claire Tomlin
Sergey Levine
213
0
0
02 Oct 2023
Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations
J. S. Wind
218
2
0
04 Sep 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
411
277
0
16 Jun 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs
Neural Information Processing Systems (NeurIPS), 2023
D. Chistikov
Matthias Englert
R. Lazic
MLT
257
15
0
10 Jun 2023
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems
Neural Information Processing Systems (NeurIPS), 2023
Chaoyue Liu
Dmitriy Drusvyatskiy
M. Belkin
Damek Davis
Yi-An Ma
ODL
179
20
0
05 Jun 2023
Neural (Tangent Kernel) Collapse
Neural Information Processing Systems (NeurIPS), 2023
Mariia Seleznova
Dana Weitzner
Raja Giryes
Gitta Kutyniok
H. Chou
337
15
0
25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
International Conference on Machine Learning (ICML), 2023
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
223
16
0
22 May 2023
Convergence of Alternating Gradient Descent for Matrix Factorization
Neural Information Processing Systems (NeurIPS), 2023
R. Ward
T. Kolda
243
12
0
11 May 2023
On the Stepwise Nature of Self-Supervised Learning
International Conference on Machine Learning (ICML), 2023
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
278
40
0
27 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
International Conference on Machine Learning (ICML), 2023
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
359
7
0
06 Mar 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
Annual Conference Computational Learning Theory (COLT), 2023
Weihang Xu
S. Du
310
21
0
20 Feb 2023
How to prepare your task head for finetuning
International Conference on Learning Representations (ICLR), 2023
Yi Ren
Shangmin Guo
Wonho Bae
Danica J. Sutherland
135
19
0
11 Feb 2023
Implicit Regularization for Group Sparsity
International Conference on Learning Representations (ICLR), 2023
Jiangyuan Li
THANH VAN NGUYEN
Chinmay Hegde
Raymond K. W. Wong
250
12
0
29 Jan 2023
Effects of Data Geometry in Early Deep Learning
Neural Information Processing Systems (NeurIPS), 2022
Saket Tiwari
George Konidaris
342
7
0
29 Dec 2022
Improved Convergence Guarantees for Shallow Neural Networks
A. Razborov
ODL
217
1
0
05 Dec 2022
Infinite-width limit of deep linear neural networks
Communications on Pure and Applied Mathematics (CPAM), 2022
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
182
23
0
29 Nov 2022
Mechanistic Mode Connectivity
International Conference on Machine Learning (ICML), 2022
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
299
56
0
15 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
International Conference on Learning Representations (ICLR), 2022
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
365
28
0
31 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
International Conference on Machine Learning (ICML), 2022
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
320
67
0
25 Oct 2022
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
International Conference on Learning Representations (ICLR), 2022
Yoonho Lee
Annie S. Chen
Fahim Tajwar
Ananya Kumar
Huaxiu Yao
Abigail Z. Jacobs
Chelsea Finn
OOD
386
253
0
20 Oct 2022
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Haotian Ye
James Zou
Linjun Zhang
OOD
450
27
0
20 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks
A. K. Akash
Sixu Li
Nicolas García Trillos
199
15
0
13 Oct 2022
Boosting Adversarial Robustness From The Perspective of Effective Margin Regularization
British Machine Vision Conference (BMVC), 2022
Ziquan Liu
Antoni B. Chan
AAML
218
6
0
11 Oct 2022
1
2
3
Next