ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.00900
  4. Cited By
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers
  are Automatically Balanced
v1v2 (latest)

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

4 June 2018
S. Du
Wei Hu
Jason D. Lee
    MLT
ArXiv (abs)PDFHTML

Papers citing "Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced"

50 / 125 papers shown
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
Min Gan
Guang-yong Chen
Yang Yi
Lin Yang
85
0
0
03 Nov 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang
Guido Montúfar
219
0
0
29 Sep 2025
Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models
Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models
Tianxiao Cao
Kyohei Atarashi
H. Kashima
227
0
0
14 Aug 2025
Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond
Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond
Jiaxin Deng
Qingcheng Zhu
Junbiao Pang
Linlin Yang
Zhongqian Fu
Baochang Zhang
150
0
0
01 Aug 2025
Symmetry in Neural Network Parameter Spaces
Symmetry in Neural Network Parameter Spaces
Bo Zhao
Robin Walters
Rose Yu
366
8
0
16 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
417
4
0
06 Jun 2025
Transformative or Conservative? Conservation laws for ResNets and Transformers
Transformative or Conservative? Conservation laws for ResNets and Transformers
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
251
3
0
06 Jun 2025
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion
Liang Zhang
Bingcong Li
Niao He
256
3
0
03 Jun 2025
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Yilang Zhang
Bingcong Li
G. Giannakis
613
2
0
24 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models
Ziqing Xu
Hancheng Min
Salma Tarmoun
Enrique Mallada
Rene Vidal
270
2
0
16 May 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
303
1
0
04 Mar 2025
Low-rank bias, weight decay, and model merging in neural networks
Low-rank bias, weight decay, and model merging in neural networks
Ilja Kuzborskij
Yasin Abbasi-Yadkori
353
1
0
24 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networksAnnual Conference Computational Learning Theory (COLT), 2025
Sholom Schechtman
Nicolas Schreuder
1.1K
0
0
08 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
638
6
0
01 Feb 2025
$k$-SVD with Gradient Descent
kkk-SVD with Gradient Descent
Yassir Jedra
Yassir Jedra
452
0
0
01 Feb 2025
Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry
Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry
Giovanni Luca Marchetti
Vahid Shahverdi
Stefano Mereta
Matthew Trager
Kathlén Kohn
344
2
0
31 Jan 2025
Training Dynamics of In-Context Learning in Linear Attention
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
308
22
0
27 Jan 2025
Geometry and Optimization of Shallow Polynomial Networks
Geometry and Optimization of Shallow Polynomial Networks
Yossi Arjevani
Joan Bruna
Joe Kileel
Elzbieta Polak
Matthew Trager
296
5
0
10 Jan 2025
On subdifferential chain rule of matrix factorization and beyond
On subdifferential chain rule of matrix factorization and beyond
Jiewen Guan
Anthony Man-Cho So
AI4CE
258
1
0
07 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
How Feature Learning Can Improve Neural Scaling LawsInternational Conference on Learning Representations (ICLR), 2024
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
477
35
0
26 Sep 2024
In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting
In-depth Analysis of Low-rank Matrix Factorisation in a Federated SettingAAAI Conference on Artificial Intelligence (AAAI), 2024
Constantin Philippenko
Kevin Scaman
Laurent Massoulié
FedML
358
3
0
13 Sep 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
327
12
0
21 Aug 2024
Get rich quick: exact solutions reveal how unbalanced initializations
  promote rapid feature learning
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learningNeural Information Processing Systems (NeurIPS), 2024
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
338
25
0
10 Jun 2024
Masks, Signs, And Learning Rate Rewinding
Masks, Signs, And Learning Rate Rewinding
Advait Gadhikar
R. Burkholz
229
14
0
29 Feb 2024
Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space
Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space
Mingyang Yi
Bohan Wang
336
0
0
24 Jan 2024
Good regularity creates large learning rate implicit biases: edge of
  stability, balancing, and catapult
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang
Zhenghao Xu
Tuo Zhao
Molei Tao
316
16
0
26 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing:
  The Curses of Symmetry and Initialization
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and InitializationInternational Conference on Learning Representations (ICLR), 2023
Nuoya Xiong
Lijun Ding
Simon S. Du
471
18
0
03 Oct 2023
Deep Neural Networks Tend To Extrapolate Predictably
Deep Neural Networks Tend To Extrapolate PredictablyInternational Conference on Learning Representations (ICLR), 2023
Katie Kang
Amrith Rajagopal Setlur
Claire Tomlin
Sergey Levine
213
0
0
02 Oct 2023
Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations
Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations
J. S. Wind
218
2
0
04 Sep 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
411
277
0
16 Jun 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias
  for Correlated Inputs
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated InputsNeural Information Processing Systems (NeurIPS), 2023
D. Chistikov
Matthias Englert
R. Lazic
MLT
257
15
0
10 Jun 2023
Aiming towards the minimizers: fast convergence of SGD for
  overparametrized problems
Aiming towards the minimizers: fast convergence of SGD for overparametrized problemsNeural Information Processing Systems (NeurIPS), 2023
Chaoyue Liu
Dmitriy Drusvyatskiy
M. Belkin
Damek Davis
Yi-An Ma
ODL
179
20
0
05 Jun 2023
Neural (Tangent Kernel) Collapse
Neural (Tangent Kernel) CollapseNeural Information Processing Systems (NeurIPS), 2023
Mariia Seleznova
Dana Weitzner
Raja Giryes
Gitta Kutyniok
H. Chou
337
15
0
25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow
  Solutions in Scalar Networks and Beyond
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and BeyondInternational Conference on Machine Learning (ICML), 2023
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
223
16
0
22 May 2023
Convergence of Alternating Gradient Descent for Matrix Factorization
Convergence of Alternating Gradient Descent for Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2023
R. Ward
T. Kolda
243
12
0
11 May 2023
On the Stepwise Nature of Self-Supervised Learning
On the Stepwise Nature of Self-Supervised LearningInternational Conference on Machine Learning (ICML), 2023
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
278
40
0
27 Mar 2023
Critical Points and Convergence Analysis of Generative Deep Linear
  Networks Trained with Bures-Wasserstein Loss
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein LossInternational Conference on Machine Learning (ICML), 2023
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
359
7
0
06 Mar 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for
  Learning a Single Neuron
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single NeuronAnnual Conference Computational Learning Theory (COLT), 2023
Weihang Xu
S. Du
310
21
0
20 Feb 2023
How to prepare your task head for finetuning
How to prepare your task head for finetuningInternational Conference on Learning Representations (ICLR), 2023
Yi Ren
Shangmin Guo
Wonho Bae
Danica J. Sutherland
135
19
0
11 Feb 2023
Implicit Regularization for Group Sparsity
Implicit Regularization for Group SparsityInternational Conference on Learning Representations (ICLR), 2023
Jiangyuan Li
THANH VAN NGUYEN
Chinmay Hegde
Raymond K. W. Wong
250
12
0
29 Jan 2023
Effects of Data Geometry in Early Deep Learning
Effects of Data Geometry in Early Deep LearningNeural Information Processing Systems (NeurIPS), 2022
Saket Tiwari
George Konidaris
342
7
0
29 Dec 2022
Improved Convergence Guarantees for Shallow Neural Networks
Improved Convergence Guarantees for Shallow Neural Networks
A. Razborov
ODL
217
1
0
05 Dec 2022
Infinite-width limit of deep linear neural networks
Infinite-width limit of deep linear neural networksCommunications on Pure and Applied Mathematics (CPAM), 2022
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
182
23
0
29 Nov 2022
Mechanistic Mode Connectivity
Mechanistic Mode ConnectivityInternational Conference on Machine Learning (ICML), 2022
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
299
56
0
15 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flowInternational Conference on Learning Representations (ICLR), 2022
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
365
28
0
31 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language ModelsInternational Conference on Machine Learning (ICML), 2022
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
320
67
0
25 Oct 2022
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
Surgical Fine-Tuning Improves Adaptation to Distribution ShiftsInternational Conference on Learning Representations (ICLR), 2022
Yoonho Lee
Annie S. Chen
Fahim Tajwar
Ananya Kumar
Huaxiu Yao
Abigail Z. Jacobs
Chelsea Finn
OOD
386
253
0
20 Oct 2022
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature NoiseInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Haotian Ye
James Zou
Linjun Zhang
OOD
450
27
0
20 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity
  of Neural Networks
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks
A. K. Akash
Sixu Li
Nicolas García Trillos
199
15
0
13 Oct 2022
Boosting Adversarial Robustness From The Perspective of Effective Margin
  Regularization
Boosting Adversarial Robustness From The Perspective of Effective Margin RegularizationBritish Machine Vision Conference (BMVC), 2022
Ziquan Liu
Antoni B. Chan
AAML
218
6
0
11 Oct 2022
123
Next