v1v2 (latest)

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

4 June 2018

Papers citing "Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced"

50 / 125 papers shown

A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization

03 Nov 2025

Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region

Shuang Liang

Guido Montúfar

219

29 Sep 2025

Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models

Tianxiao Cao

Kyohei Atarashi

H. Kashima

227

14 Aug 2025

Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond

150

01 Aug 2025

Symmetry in Neural Network Parameter Spaces

Bo Zhao

Robin Walters

Rose Yu

366

16 Jun 2025

Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

D. Kunin

Giovanni Luca Marchetti

417

06 Jun 2025

Transformative or Conservative? Conservation laws for ResNets and Transformers

Sibylle Marcotte

Rémi Gribonval

Gabriel Peyré

251

06 Jun 2025

PoLAR: Polar-Decomposed Low-Rank Adapter Representation

256

03 Jun 2025

RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models

Yilang Zhang

Bingcong Li

G. Giannakis

613

24 May 2025

A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models

270

16 May 2025

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

303

04 Mar 2025

Low-rank bias, weight decay, and model merging in neural networks

Ilja Kuzborskij

Yasin Abbasi-Yadkori

353

24 Feb 2025

The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networksAnnual Conference Computational Learning Theory (COLT), 2025

Sholom Schechtman

Nicolas Schreuder

1.1K

08 Feb 2025

Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

638

01 Feb 2025

k

-SVD with Gradient Descent

Yassir Jedra

452

01 Feb 2025

Algebra Unveils Deep Learning -- An Invitation to Neuroalgebraic Geometry

Giovanni Luca Marchetti

344

31 Jan 2025

Training Dynamics of In-Context Learning in Linear Attention

308

27 Jan 2025

Geometry and Optimization of Shallow Polynomial Networks

296

10 Jan 2025

On subdifferential chain rule of matrix factorization and beyond

Jiewen Guan

Anthony Man-Cho So

AI4CE

258

07 Oct 2024

How Feature Learning Can Improve Neural Scaling LawsInternational Conference on Learning Representations (ICLR), 2024

Blake Bordelon

Alexander B. Atanasov

Cengiz Pehlevan

477

26 Sep 2024

In-depth Analysis of Low-rank Matrix Factorisation in a Federated SettingAAAI Conference on Artificial Intelligence (AAAI), 2024

Constantin Philippenko

Kevin Scaman

Laurent Massoulié

FedML

358

13 Sep 2024

Approaching Deep Learning through the Spectral Dynamics of Weights

327

21 Aug 2024

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learningNeural Information Processing Systems (NeurIPS), 2024

Feng Chen

338

10 Jun 2024

Masks, Signs, And Learning Rate Rewinding

Advait Gadhikar

R. Burkholz

229

29 Feb 2024

Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

Mingyang Yi

Bohan Wang

336

24 Jan 2024

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

316

26 Oct 2023

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and InitializationInternational Conference on Learning Representations (ICLR), 2023

Nuoya Xiong

Lijun Ding

Simon S. Du

471

03 Oct 2023

Deep Neural Networks Tend To Extrapolate PredictablyInternational Conference on Learning Representations (ICLR), 2023

Katie Kang

Amrith Rajagopal Setlur

Claire Tomlin

Sergey Levine

213

02 Oct 2023

Implicit Regularization Makes Overparameterized Asymmetric Matrix Sensing Robust to Perturbations

J. S. Wind

218

04 Sep 2023

Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023

Ruiqi Zhang

Spencer Frei

Peter L. Bartlett

411

277

16 Jun 2023

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated InputsNeural Information Processing Systems (NeurIPS), 2023

257

10 Jun 2023

Aiming towards the minimizers: fast convergence of SGD for overparametrized problemsNeural Information Processing Systems (NeurIPS), 2023

179

05 Jun 2023

Neural (Tangent Kernel) CollapseNeural Information Processing Systems (NeurIPS), 2023

337

25 May 2023

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and BeyondInternational Conference on Machine Learning (ICML), 2023

223

22 May 2023

Convergence of Alternating Gradient Descent for Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2023

R. Ward

T. Kolda

243

11 May 2023

On the Stepwise Nature of Self-Supervised LearningInternational Conference on Machine Learning (ICML), 2023

278

27 Mar 2023

Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein LossInternational Conference on Machine Learning (ICML), 2023

Pierre Bréchet

Katerina Papagiannouli

Jing An

Guido Montúfar

359

06 Mar 2023

Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single NeuronAnnual Conference Computational Learning Theory (COLT), 2023

Weihang Xu

S. Du

310

20 Feb 2023

How to prepare your task head for finetuningInternational Conference on Learning Representations (ICLR), 2023

Yi Ren

Shangmin Guo

Wonho Bae

Danica J. Sutherland

135

11 Feb 2023

Implicit Regularization for Group SparsityInternational Conference on Learning Representations (ICLR), 2023

250

29 Jan 2023

Effects of Data Geometry in Early Deep LearningNeural Information Processing Systems (NeurIPS), 2022

Saket Tiwari

George Konidaris

342

29 Dec 2022

Improved Convergence Guarantees for Shallow Neural Networks

A. Razborov

ODL

217

05 Dec 2022

Infinite-width limit of deep linear neural networksCommunications on Pure and Applied Mathematics (CPAM), 2022

Lénaïc Chizat

Maria Colombo

Xavier Fernández-Real

Alessio Figalli

182

29 Nov 2022

Mechanistic Mode ConnectivityInternational Conference on Machine Learning (ICML), 2022

299

15 Nov 2022

Symmetries, flat minima, and the conserved quantities of gradient flowInternational Conference on Learning Representations (ICLR), 2022

365

31 Oct 2022

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language ModelsInternational Conference on Machine Learning (ICML), 2022

320

25 Oct 2022

Surgical Fine-Tuning Improves Adaptation to Distribution ShiftsInternational Conference on Learning Representations (ICLR), 2022

386

253

20 Oct 2022

Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature NoiseInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022

450

20 Oct 2022

Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks

A. K. Akash

Sixu Li

Nicolas García Trillos

199

13 Oct 2022

Boosting Adversarial Robustness From The Perspective of Effective Margin RegularizationBritish Machine Vision Conference (BMVC), 2022

Ziquan Liu

Antoni B. Chan

AAML

218

11 Oct 2022