Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1806.00900
Cited By
v1
v2 (latest)
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
4 June 2018
S. Du
Wei Hu
Jason D. Lee
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced"
50 / 119 papers shown
Title
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
Min Gan
Guang-yong Chen
Yang Yi
Lin Yang
56
0
0
03 Nov 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang
Guido Montúfar
155
0
0
29 Sep 2025
Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models
Tianxiao Cao
Kyohei Atarashi
H. Kashima
170
0
0
14 Aug 2025
Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond
Jiaxin Deng
Qingcheng Zhu
Junbiao Pang
Linlin Yang
Zhongqian Fu
Baochang Zhang
81
0
0
01 Aug 2025
Symmetry in Neural Network Parameter Spaces
Bo Zhao
Robin Walters
Rose Yu
261
6
0
16 Jun 2025
Transformative or Conservative? Conservation laws for ResNets and Transformers
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
204
3
0
06 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
281
3
0
06 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion
Liang Zhang
Bingcong Li
Niao He
188
3
0
03 Jun 2025
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Yilang Zhang
Bingcong Li
G. Giannakis
494
2
0
24 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
225
2
0
16 May 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
263
1
0
04 Mar 2025
Low-rank bias, weight decay, and model merging in neural networks
Ilja Kuzborskij
Yasin Abbasi-Yadkori
263
1
0
24 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks
Annual Conference Computational Learning Theory (COLT), 2025
Sholom Schechtman
Nicolas Schreuder
953
0
0
08 Feb 2025
k
k
k
-SVD with Gradient Descent
Yassir Jedra
Yassir Jedra
372
0
0
01 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
494
5
0
01 Feb 2025
Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry
Giovanni Luca Marchetti
Vahid Shahverdi
Stefano Mereta
Matthew Trager
Kathlén Kohn
296
2
0
31 Jan 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
255
19
0
27 Jan 2025
Geometry and Optimization of Shallow Polynomial Networks
Yossi Arjevani
Joan Bruna
Joe Kileel
Elzbieta Polak
Matthew Trager
216
4
0
10 Jan 2025
How Feature Learning Can Improve Neural Scaling Laws
International Conference on Learning Representations (ICLR), 2024
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
386
32
0
26 Sep 2024
In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting
AAAI Conference on Artificial Intelligence (AAAI), 2024
Constantin Philippenko
Kevin Scaman
Laurent Massoulié
FedML
289
4
0
13 Sep 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
254
12
0
21 Aug 2024
Masks, Signs, And Learning Rate Rewinding
Advait Gadhikar
R. Burkholz
204
14
0
29 Feb 2024
Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space
Mingyang Yi
Bohan Wang
253
0
0
24 Jan 2024
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang
Zhenghao Xu
Tuo Zhao
Molei Tao
274
16
0
26 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
International Conference on Learning Representations (ICLR), 2023
Nuoya Xiong
Lijun Ding
Simon S. Du
394
18
0
03 Oct 2023
Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations
J. S. Wind
193
2
0
04 Sep 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
333
270
0
16 Jun 2023
Neural (Tangent Kernel) Collapse
Neural Information Processing Systems (NeurIPS), 2023
Mariia Seleznova
Dana Weitzner
Raja Giryes
Gitta Kutyniok
H. Chou
248
14
0
25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
International Conference on Machine Learning (ICML), 2023
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
179
16
0
22 May 2023
Convergence of Alternating Gradient Descent for Matrix Factorization
Neural Information Processing Systems (NeurIPS), 2023
R. Ward
T. Kolda
207
12
0
11 May 2023
On the Stepwise Nature of Self-Supervised Learning
International Conference on Machine Learning (ICML), 2023
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
216
38
0
27 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
International Conference on Machine Learning (ICML), 2023
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
338
7
0
06 Mar 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
Annual Conference Computational Learning Theory (COLT), 2023
Weihang Xu
S. Du
259
21
0
20 Feb 2023
How to prepare your task head for finetuning
International Conference on Learning Representations (ICLR), 2023
Yi Ren
Shangmin Guo
Wonho Bae
Danica J. Sutherland
110
18
0
11 Feb 2023
Implicit Regularization for Group Sparsity
International Conference on Learning Representations (ICLR), 2023
Jiangyuan Li
THANH VAN NGUYEN
Chinmay Hegde
Raymond K. W. Wong
201
11
0
29 Jan 2023
Effects of Data Geometry in Early Deep Learning
Neural Information Processing Systems (NeurIPS), 2022
Saket Tiwari
George Konidaris
289
7
0
29 Dec 2022
Improved Convergence Guarantees for Shallow Neural Networks
A. Razborov
ODL
181
1
0
05 Dec 2022
Infinite-width limit of deep linear neural networks
Communications on Pure and Applied Mathematics (CPAM), 2022
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
162
21
0
29 Nov 2022
Mechanistic Mode Connectivity
International Conference on Machine Learning (ICML), 2022
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
246
56
0
15 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
International Conference on Learning Representations (ICLR), 2022
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
304
26
0
31 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
International Conference on Machine Learning (ICML), 2022
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
296
67
0
25 Oct 2022
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
International Conference on Learning Representations (ICLR), 2022
Yoonho Lee
Annie S. Chen
Fahim Tajwar
Ananya Kumar
Huaxiu Yao
Abigail Z. Jacobs
Chelsea Finn
OOD
250
250
0
20 Oct 2022
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Haotian Ye
James Zou
Linjun Zhang
OOD
303
27
0
20 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks
A. K. Akash
Sixu Li
Nicolas García Trillos
178
15
0
13 Oct 2022
Boosting Adversarial Robustness From The Perspective of Effective Margin Regularization
British Machine Vision Conference (BMVC), 2022
Ziquan Liu
Antoni B. Chan
AAML
138
6
0
11 Oct 2022
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions
International Conference on Learning Representations (ICLR), 2022
Arthur Jacot
310
36
0
29 Sep 2022
Magnitude and Angle Dynamics in Training Single ReLU Neurons
Neural Networks (NN), 2022
Sangmin Lee
Byeongsu Sim
Jong Chul Ye
MLT
303
6
0
27 Sep 2022
A Validation Approach to Over-parameterized Matrix and Image Recovery
Lijun Ding
Zhen Qin
Liwei Jiang
Jinxin Zhou
Zhihui Zhu
332
15
0
21 Sep 2022
Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)
Neural Information Processing Systems (NeurIPS), 2022
Zhenyu Zhu
Fanghui Liu
Grigorios G. Chrysos
Volkan Cevher
258
23
0
15 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
Communications of the ACM (CACM), 2022
Gal Vardi
FedML
AI4CE
292
107
0
26 Aug 2022
1
2
3
Next