v1v2 (latest)

Implicit Regularization in Deep Learning May Not Be Explainable by Norms

13 May 2020

Noam Razin

Nadav Cohen

ArXiv (abs)PDF HTML

Papers citing "Implicit Regularization in Deep Learning May Not Be Explainable by Norms"

50 / 113 papers shown

Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics

Connall Garrod

Jonathan P. Keating

Christos Thrampoulidis

217

03 Dec 2025

Why is Your Language Model a Poor Implicit Reward Model?

297

10 Jul 2025

264

21 Jun 2025

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

764

12 Jun 2025

Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

492

11 Apr 2025

Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture

Yikun Hou

Suvrit Sra

A. Yurtsever

411

27 Jan 2025

Weight decay induces low-rank attention layersNeural Information Processing Systems (NeurIPS), 2024

Seijin Kobayashi

Yassir Akram

J. Oswald

313

31 Oct 2024

Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization

414

11 Oct 2024

Tailed Low-Rank Matrix Factorization for Similarity Matrix Completion

Changyi Ma

Runsheng Yu

Xiao Chen

Youzhi Zhang

273

29 Sep 2024

Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning

Nadav Cohen

Noam Razin

301

25 Aug 2024

Approaching Deep Learning through the Spectral Dynamics of Weights

389

21 Aug 2024

The Implicit Bias of Adam on Separable DataNeural Information Processing Systems (NeurIPS), 2024

349

15 Jun 2024

Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion

438

22 May 2024

On Uncertainty Quantification for Near-Bayes Optimal Algorithms

Ziyu Wang

Chris Holmes

UQCV

347

28 Mar 2024

Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

Junwei Su

Difan Zou

Chuan Wu

476

13 Mar 2024

Implicit Regularization via Spectral Neural Networks and Non-linear Matrix Sensing

Hong T.M. Chu

Subhro Ghosh

Chi Thanh Lam

Soumendu Sundar Mukherjee

170

27 Feb 2024

On the Role of Initialization on the Implicit Bias in Deep Linear Networks

Oria Gruber

H. Avron

AI4CE

193

04 Feb 2024

Linear Recursive Feature Machines provably recover low-rank matricesProceedings of the National Academy of Sciences of the United States of America (PNAS), 2024

Adityanarayanan Radhakrishnan

Misha Belkin

Dmitriy Drusvyatskiy

448

09 Jan 2024

The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

Tolga Ergen

Mert Pilanci

246

19 Dec 2023

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce GrokkingInternational Conference on Learning Representations (ICLR), 2023

426

30 Nov 2023

In Search of a Data Transformation That Accelerates Neural Field TrainingComputer Vision and Pattern Recognition (CVPR), 2023

416

28 Nov 2023

Vanishing Gradients in Reinforcement Finetuning of Language ModelsInternational Conference on Learning Representations (ICLR), 2023

377

31 Oct 2023

A Quadratic Synchronization Rule for Distributed Deep LearningInternational Conference on Learning Representations (ICLR), 2023

355

22 Oct 2023

Training Dynamics of Deep Network Linear Regions

Ahmed Imtiaz Humayun

Randall Balestriero

Richard Baraniuk

270

19 Oct 2023

Are GATs Out of Balance?Neural Information Processing Systems (NeurIPS), 2023

Nimrah Mustafa

Aleksandar Bojchevski

R. Burkholz

406

11 Oct 2023

Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks

J. S. Wind

Vegard Antun

A. Hansen

319

13 Jul 2023

Implicit regularisation in stochastic gradient descent: from single-objective to two-player games

Mihaela Rosca

M. Deisenroth

214

11 Jul 2023

The Implicit Bias of Minima Stability in Multivariate Shallow ReLU NetworksInternational Conference on Learning Representations (ICLR), 2023

353

30 Jun 2023

Maintaining Plasticity in Deep Continual Learning

Shibhansh Dohare

J. F. Hernandez-Garcia

490

23 Jun 2023

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

243

22 Jun 2023

Exact Count of Boundary Pieces of ReLU Classifiers: Towards the Proper Complexity Measure for ClassificationConference on Uncertainty in Artificial Intelligence (UAI), 2023

Paweł Piwek

Adam Klukowski

Tianyang Hu

219

15 Jun 2023

Transformers learn through gradual rank increaseNeural Information Processing Systems (NeurIPS), 2023

428

12 Jun 2023

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated InputsNeural Information Processing Systems (NeurIPS), 2023

324

10 Jun 2023

Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature SuppressionInternational Conference on Machine Learning (ICML), 2023

Baharan Mirzasoleiman

SSL

425

25 May 2023

$Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps from high to low rank$

Implicit bias of SGD in

L_{2}

-regularized linear DNNs: One-way jumps from high to low rankInternational Conference on Learning Representations (ICLR), 2023

Zihan Wang

Arthur Jacot

355

25 May 2023

ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index ModelsSIAM Journal on Mathematics of Data Science (SIMODS), 2023

Suzanna Parkinson

Greg Ongie

Rebecca Willett

594

24 May 2023

Exploring the Complexity of Deep Neural Networks through Functional EquivalenceInternational Conference on Machine Learning (ICML), 2023

Guohao Shen

492

19 May 2023

Robust Implicit Regularization via Weight NormalizationInformation and Inference A Journal of the IMA (JIII), 2023

H. Chou

Holger Rauhut

Rachel A. Ward

499

09 May 2023

Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems

Kevin Zeng

Carlos E. Pérez De Jesús

Andrew J Fox

M. Graham

AI4CE

412

01 May 2023

On the Effect of Initialization: The Scaling Path of 2-Layer Neural NetworksJournal of machine learning research (JMLR), 2023

Sebastian Neumayer

Lénaïc Chizat

M. Unser

285

31 Mar 2023

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum EntanglementNeural Information Processing Systems (NeurIPS), 2023

467

20 Mar 2023

First-order ANIL learns linear representations despite misspecified latent dimension

Oğuz Kaan Yüksel

Etienne Boursier

Nicolas Flammarion

345

02 Mar 2023

Penalising the biases in norm regularisation enforces sparsityNeural Information Processing Systems (NeurIPS), 2023

Etienne Boursier

Nicolas Flammarion

619

02 Mar 2023

Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descentInternational Conference on Learning Representations (ICLR), 2023

308

02 Feb 2023

Simplicity Bias in 1-Hidden Layer Neural NetworksNeural Information Processing Systems (NeurIPS), 2023

304

01 Feb 2023

Generalization on the Unseen, Logic Reasoning and Degree CurriculumInternational Conference on Machine Learning (ICML), 2023

547

30 Jan 2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix SensingInternational Conference on Machine Learning (ICML), 2023

361

27 Jan 2023

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

425

29 Dec 2022

Rank-1 Matrix Completion with Gradient Descent and Small Random InitializationNeural Information Processing Systems (NeurIPS), 2022

Daesung Kim

Hye Won Chung

349

19 Dec 2022

On the Ability of Graph Neural Networks to Model Interactions Between VerticesNeural Information Processing Systems (NeurIPS), 2022

Noam Razin

Tom Verbin

Nadav Cohen

477

29 Nov 2022