Kernel and Rich Regimes in Overparametrized Models

13 June 2019

Papers citing "Kernel and Rich Regimes in Overparametrized Models"

50 / 85 papers shown

Title
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias Yura Malitsky Alexander Posch 27 0 0 05 May 2025
Generalization through variance: how noise shapes inductive biases in diffusion models John J. Vastola DiffM 138 1 0 16 Apr 2025
On the Cone Effect in the Learning Dynamics Zhanpeng Zhou Yongyi Yang Jie Ren Mahito Sugiyama Junchi Yan 53 0 0 20 Mar 2025
MLPs at the EOC: Dynamics of Feature Learning Dávid Terjék MLT 41 0 0 18 Feb 2025
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries Chris Kolb T. Weber Bernd Bischl David Rügamer 104 0 0 04 Feb 2025
Optimization Insights into Deep Diagonal Linear Networks Hippolyte Labarrière C. Molinari Lorenzo Rosasco S. Villa Cristian Vega 76 0 0 21 Dec 2024
Fast Training of Sinusoidal Neural Fields via Scaling Initialization Taesun Yeom Sangyoon Lee Jaeho Lee 53 2 0 07 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength Alexander B. Atanasov Alexandru Meterez James B. Simon C. Pehlevan 43 2 0 06 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws Blake Bordelon Alexander B. Atanasov C. Pehlevan 49 12 0 26 Sep 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks Clémentine Dominé Nicolas Anguita A. Proca Lukas Braun D. Kunin P. Mediano Andrew M. Saxe 30 3 0 22 Sep 2024
Mask in the Mirror: Implicit Sparsification Tom Jacobs R. Burkholz 40 3 0 19 Aug 2024
Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning Muhammad Salman Ali Maryam Qamar Sung-Ho Bae Enzo Tartaglione 3DGS 38 11 0 26 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD Pierfrancesco Beneventano Andrea Pinto Tomaso A. Poggio MLT 27 1 0 17 Jun 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees A. Banerjee Qiaobo Li Yingxue Zhou 44 0 0 11 Jun 2024
When does compositional structure yield compositional generalization? A kernel theory Samuel Lippl Kim Stachenfeld NAI CoGe 67 5 0 26 May 2024
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations Akshay Kumar Jarvis D. Haupt ODL 44 3 0 12 Mar 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization Sungbin Shin Dongyeop Lee Maksym Andriushchenko Namhoon Lee AAML 41 1 0 29 Nov 2023
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics Yehonatan Avidan Qianyi Li H. Sompolinsky 60 8 0 08 Sep 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization Kaiyue Wen Zhiyuan Li Tengyu Ma FAtt 30 26 0 20 Jul 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond Itai Kreisler Mor Shpigel Nacson Daniel Soudry Y. Carmon 23 13 0 22 May 2023
Convex optimization over a probability simplex James Chok G. Vasil 23 2 0 15 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks Eshaan Nichani Alexandru Damian Jason D. Lee MLT 36 13 0 11 May 2023
Robust Implicit Regularization via Weight Normalization H. Chou Holger Rauhut Rachel A. Ward 28 7 0 09 May 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks Scott Pesme Nicolas Flammarion 29 35 0 02 Apr 2023
General Loss Functions Lead to (Approximate) Interpolation in High Dimensions Kuo-Wei Lai Vidya Muthukumar 18 5 0 13 Mar 2023
Phase Diagram of Initial Condensation for Two-layer Neural Networks Zheng Chen Yuqing Li Tao Luo Zhaoguang Zhou Z. Xu MLT AI4CE 41 8 0 12 Mar 2023
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions Marwan Omar SILM AAML 25 20 0 14 Feb 2023
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning François Caron Fadhel Ayed Paul Jung Hoileong Lee Juho Lee Hongseok Yang 59 2 0 02 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression Mo Zhou Rong Ge 27 2 0 01 Feb 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing Jikai Jin Zhiyuan Li Kaifeng Lyu S. Du Jason D. Lee MLT 46 34 0 27 Jan 2023
Infinite-width limit of deep linear neural networks Lénaïc Chizat Maria Colombo Xavier Fernández-Real Alessio Figalli 31 14 0 29 Nov 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion Michael Murray Hui Jin Benjamin Bowman Guido Montúfar 30 11 0 15 Nov 2022
Learning Single-Index Models with Shallow Neural Networks A. Bietti Joan Bruna Clayton Sanford M. Song 162 67 0 27 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models Hong Liu Sang Michael Xie Zhiyuan Li Tengyu Ma AI4CE 32 49 0 25 Oct 2022
Continual task learning in natural and artificial agents Timo Flesch Andrew M. Saxe Christopher Summerfield CLL 35 24 0 10 Oct 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do Niladri S. Chatterji Philip M. Long 15 8 0 19 Sep 2022
Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty Thomas George Guillaume Lajoie A. Baratin 23 5 0 19 Sep 2022
Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization) Zhenyu Zhu Fanghui Liu Grigorios G. Chrysos V. Cevher 39 19 0 15 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms Gal Vardi FedML AI4CE 30 72 0 26 Aug 2022
Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution Jianhao Ma S. Fattahi 40 5 0 15 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent Zhiyuan Li Tianhao Wang Jason D. Lee Sanjeev Arora 32 27 0 08 Jul 2022
Neural Networks can Learn Representations with Gradient Descent Alexandru Damian Jason D. Lee Mahdi Soltanolkotabi SSL MLT 17 112 0 30 Jun 2022
Learning sparse features can lead to overfitting in neural networks Leonardo Petrini Francesco Cagnetta Eric Vanden-Eijnden M. Wyart MLT 29 23 0 24 Jun 2022
Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Łojasiewicz Functions when the Non-Convexity is Averaged-Out Jun-Kun Wang Chi-Heng Lin Andre Wibisono Bin Hu 19 20 0 22 Jun 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation Loucas Pillaud-Vivien J. Reygner Nicolas Flammarion NoLa 31 31 0 20 Jun 2022
Reconstructing Training Data from Trained Neural Networks Niv Haim Gal Vardi Gilad Yehudai Ohad Shamir Michal Irani 34 132 0 15 Jun 2022
Towards Understanding Sharpness-Aware Minimization Maksym Andriushchenko Nicolas Flammarion AAML 24 133 0 13 Jun 2022
Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials Eshaan Nichani Yunzhi Bai Jason D. Lee 21 10 0 08 Jun 2022
Generalized Federated Learning via Sharpness Aware Minimization Zhe Qu Xingyu Li Rui Duan Yaojiang Liu Bo Tang Zhuo Lu FedML 20 130 0 06 Jun 2022
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs Etienne Boursier Loucas Pillaud-Vivien Nicolas Flammarion ODL 19 58 0 02 Jun 2022