Characterizing Implicit Bias in Terms of Optimization Geometry

22 February 2018

Papers citing "Characterizing Implicit Bias in Terms of Optimization Geometry"

50 / 72 papers shown

Title
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias Yura Malitsky Alexander Posch 19 0 0 05 May 2025
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks Chenyang Zhang Peifeng Gao Difan Zou Yuan Cao OOD MLT 59 0 0 11 Apr 2025
Theory on Mixture-of-Experts in Continual Learning Hongbo Li Sen-Fon Lin Lingjie Duan Yingbin Liang Ness B. Shroff MoE MoMe CLL 151 14 0 20 Feb 2025
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations Yize Zhao Tina Behnia V. Vakilian Christos Thrampoulidis 55 8 0 20 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks Sholom Schechtman Nicolas Schreuder 120 0 0 08 Feb 2025
Optimization Insights into Deep Diagonal Linear Networks Hippolyte Labarrière C. Molinari Lorenzo Rosasco S. Villa Cristian Vega 76 0 0 21 Dec 2024
Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning Mohammadamin Banayeeanzade Mahdi Soltanolkotabi Mohammad Rostami CLL LRM 83 1 0 29 Aug 2024
Mask in the Mirror: Implicit Sparsification Tom Jacobs R. Burkholz 40 3 0 19 Aug 2024
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning Arthur Jacot Seok Hoan Choi Yuxiao Wen AI4CE 86 2 0 08 Jul 2024
Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets Arthur Jacot Alexandre Kaiser 36 0 0 27 May 2024
When does compositional structure yield compositional generalization? A kernel theory Samuel Lippl Kim Stachenfeld NAI CoGe 65 5 0 26 May 2024
Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization Aditya Biswas 36 0 0 29 Apr 2024
High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile Jérémie Bigot Issa-Mbenard Dabo Camille Male 29 4 0 29 Mar 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning Yuxiao Wen Arthur Jacot 47 6 0 12 Feb 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization Sungbin Shin Dongyeop Lee Maksym Andriushchenko Namhoon Lee AAML 39 1 0 29 Nov 2023
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models David X. Wu A. Sahai 21 2 0 23 Jun 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks Yuan Cao Difan Zou Yuan-Fang Li Quanquan Gu MLT 29 5 0 20 Jun 2023
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage Yu Gui Cong Ma Yiqiao Zhong 17 6 0 06 Jun 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability Jingfeng Wu Vladimir Braverman Jason D. Lee 24 17 0 19 May 2023
Robust Implicit Regularization via Weight Normalization H. Chou Holger Rauhut Rachel A. Ward 28 7 0 09 May 2023
General Loss Functions Lead to (Approximate) Interpolation in High Dimensions Kuo-Wei Lai Vidya Muthukumar 16 5 0 13 Mar 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta S. Keerthi Ayan Acharya Borja Ocejo Gregory Dexter Rajiv Khanna D. Durfee Rahul Mazumder AAML 13 7 0 19 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression Mo Zhou Rong Ge 27 2 0 01 Feb 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum Emmanuel Abbe Samy Bengio Aryo Lotfi Kevin Rizk LRM 28 47 0 30 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing Jikai Jin Zhiyuan Li Kaifeng Lyu S. Du Jason D. Lee MLT 40 34 0 27 Jan 2023
$Tight bounds for maximum $\ell_1$-margin classifiers$ Tight bounds for maximum $\ell_1$ -margin classifiers Stefan Stojanovic Konstantin Donhauser Fanny Yang 29 0 0 07 Dec 2022
Regression as Classification: Influence of Task Formulation on Neural Network Features Lawrence Stewart Francis R. Bach Quentin Berthet Jean-Philippe Vert 27 24 0 10 Nov 2022
Stochastic Mirror Descent in Average Ensemble Models Taylan Kargin Fariborz Salehi B. Hassibi 11 1 0 27 Oct 2022
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent Satyen Kale Jason D. Lee Chris De Sa Ayush Sekhari Karthik Sridharan 19 4 0 13 Oct 2022
Annihilation of Spurious Minima in Two-Layer ReLU Networks Yossi Arjevani M. Field 16 8 0 12 Oct 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do Niladri S. Chatterji Philip M. Long 13 8 0 19 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 30 72 0 26 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent Zhiyuan Li Tianhao Wang Jason D. Lee Sanjeev Arora 32 27 0 08 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation Loucas Pillaud-Vivien J. Reygner Nicolas Flammarion NoLa 31 31 0 20 Jun 2022
Reconstructing Training Data from Trained Neural Networks Niv Haim Gal Vardi Gilad Yehudai Ohad Shamir Michal Irani 27 132 0 15 Jun 2022
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization I Zaghloul Amir Roi Livni Nathan Srebro 22 6 0 27 Feb 2022
Benign Overfitting in Adversarially Robust Linear Classification Jinghui Chen Yuan Cao Quanquan Gu AAML SILM 26 10 0 31 Dec 2021
The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program Yifei Wang Mert Pilanci MLT MDE 47 11 0 13 Oct 2021
Implicit Bias of Linear Equivariant Networks Hannah Lawrence Kristian Georgiev A. Dienes B. Kiani AI4CE 32 14 0 12 Oct 2021
On Margin Maximization in Linear and ReLU Networks Gal Vardi Ohad Shamir Nathan Srebro 45 28 0 06 Oct 2021
Spectral Bias in Practice: The Role of Function Frequency in Generalization Sara Fridovich-Keil Raphael Gontijo-Lopes Rebecca Roelofs 20 28 0 06 Oct 2021
A Theoretical Analysis of Fine-tuning with Linear Teachers Gal Shachaf Alon Brutzkus Amir Globerson 26 17 0 04 Jul 2021
What can linearized neural networks actually say about generalization? Guillermo Ortiz-Jiménez Seyed-Mohsen Moosavi-Dezfooli P. Frossard 21 43 0 12 Jun 2021
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent Shahar Azulay E. Moroshko Mor Shpigel Nacson Blake E. Woodworth Nathan Srebro Amir Globerson Daniel Soudry AI4CE 25 73 0 19 Feb 2021
Obtaining Adjustable Regularization for Free via Iterate Averaging Jingfeng Wu Vladimir Braverman Lin F. Yang 19 2 0 15 Aug 2020
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy E. Moroshko Suriya Gunasekar Blake E. Woodworth J. Lee Nathan Srebro Daniel Soudry 16 85 0 13 Jul 2020
When Does Preconditioning Help or Hurt Generalization? S. Amari Jimmy Ba Roger C. Grosse Xuechen Li Atsushi Nitanda Taiji Suzuki Denny Wu Ji Xu 26 32 0 18 Jun 2020
Neural Anisotropy Directions Guillermo Ortiz-Jiménez Apostolos Modas Seyed-Mohsen Moosavi-Dezfooli P. Frossard 26 16 0 17 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance Jeff Z. HaoChen Colin Wei J. Lee Tengyu Ma 18 93 0 15 Jun 2020
To Each Optimizer a Norm, To Each Norm its Generalization Sharan Vaswani Reza Babanezhad Jose Gallego Aaron Mishkin Simon Lacoste-Julien Nicolas Le Roux 13 8 0 11 Jun 2020