On Lazy Training in Differentiable Programming

19 December 2018

Papers citing "On Lazy Training in Differentiable Programming"

50 / 222 papers shown

Title
L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers S. Casarin Sergio Escalera Oswald Lanz 34 0 0 12 May 2025
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures Francesco Cagnetta Alessandro Favero Antonio Sclocchi M. Wyart 26 0 0 11 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers Nolan Dey Bin Claire Zhang Lorenzo Noci Mufan Li Blake Bordelon Shane Bergsma Cengiz Pehlevan Boris Hanin Joel Hestness 44 0 0 02 May 2025
Generalization through variance: how noise shapes inductive biases in diffusion models John J. Vastola DiffM 176 2 0 16 Apr 2025
On the Cone Effect in the Learning Dynamics Zhanpeng Zhou Yongyi Yang Jie Ren Mahito Sugiyama Junchi Yan 53 0 0 20 Mar 2025
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks Devon Jarvis Richard Klein Benjamin Rosman Andrew M. Saxe MLT 66 1 0 08 Mar 2025
Feature Learning Beyond the Edge of Stability Dávid Terjék MLT 46 0 0 18 Feb 2025
Issues with Neural Tangent Kernel Approach to Neural Networks Haoran Liu Anthony S. Tai David J. Crandall Chunfeng Huang 42 0 0 19 Jan 2025
Grokking at the Edge of Numerical Stability Lucas Prieto Melih Barsbey Pedro A.M. Mediano Tolga Birdal 48 3 0 08 Jan 2025
Optimization Insights into Deep Diagonal Linear Networks Hippolyte Labarrière C. Molinari Lorenzo Rosasco S. Villa Cristian Vega 76 0 0 21 Dec 2024
Infinite Width Limits of Self Supervised Neural Networks Maximilian Fleissner Gautham Govind Anil D. Ghoshdastidar SSL 160 0 0 17 Nov 2024
Robust Feature Learning for Multi-Index Models in High Dimensions Alireza Mousavi-Hosseini Adel Javanmard Murat A. Erdogdu OOD AAML 44 1 0 21 Oct 2024
Offline-to-online Reinforcement Learning for Image-based Grasping with Scarce Demonstrations Bryan Chan Anson Leung James Bergstra OffRL OnRL 62 0 0 19 Oct 2024
On the Impacts of the Random Initialization in the Neural Tangent Kernel Theory Guhan Chen Yicheng Li Qian Lin AAML 38 1 0 08 Oct 2024
SHAP values via sparse Fourier representation Ali Gorji Andisheh Amrollahi A. Krause FAtt 38 0 0 08 Oct 2024
Fast Training of Sinusoidal Neural Fields via Scaling Initialization Taesun Yeom Sangyoon Lee Jaeho Lee 61 2 0 07 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength Alexander B. Atanasov Alexandru Meterez James B. Simon Cengiz Pehlevan 43 2 0 06 Oct 2024
Attention layers provably solve single-location regression P. Marion Raphael Berthier Gérard Biau Claire Boyer 152 2 0 02 Oct 2024
Investigating the Impact of Model Complexity in Large Language Models Jing Luo Huiyuan Wang Weiran Huang 39 0 0 01 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 57 12 0 26 Sep 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks Clémentine Dominé Nicolas Anguita A. Proca Lukas Braun D. Kunin P. Mediano Andrew M. Saxe 38 3 0 22 Sep 2024
Continual learning with the neural tangent ensemble Ari S. Benjamin Christian Pehle Kyle Daruwalla UQCV 70 0 0 30 Aug 2024
Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning Mohammadamin Banayeeanzade Mahdi Soltanolkotabi Mohammad Rostami CLL LRM 103 1 0 29 Aug 2024
Remove Symmetries to Control Model Expressivity and Improve Optimization Liu Ziyin Yizhou Xu Isaac Chuang AAML 38 1 0 28 Aug 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics Alireza Mousavi-Hosseini Denny Wu Murat A. Erdogdu MLT AI4CE 35 6 0 14 Aug 2024
Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective Jingren Liu Zhong Ji YunLong Yu Jiale Cao Yanwei Pang Jungong Han Xuelong Li CLL 42 3 0 24 Jul 2024
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation Lu Li Tianze Zhang Zhiqi Bu Suyuchen Wang Huan He Jie Fu Yonghui Wu Jiang Bian Yong Chen Yoshua Bengio FedML MoMe 100 3 0 11 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training Bobby He Lorenzo Noci Daniele Paliotta Imanol Schlag Thomas Hofmann 39 3 0 29 May 2024
When does compositional structure yield compositional generalization? A kernel theory Samuel Lippl Kim Stachenfeld NAI CoGe 73 5 0 26 May 2024
Infinite Limits of Multi-head Transformer Dynamics Blake Bordelon Hamza Tahir Chaudhry Cengiz Pehlevan AI4CE 47 9 0 24 May 2024
Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion Zhiwei Bai Jiajie Zhao Yaoyu Zhang AI4CE 37 0 0 22 May 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks Matteo Tucat Anirbit Mukherjee Procheta Sen Mingfei Sun Omar Rivasplata MLT 39 1 0 12 Apr 2024
NTK-Guided Few-Shot Class Incremental Learning Jingren Liu Zhong Ji Yanwei Pang YunLong Yu CLL 39 3 0 19 Mar 2024
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations Akshay Kumar Jarvis Haupt ODL 44 3 0 12 Mar 2024
Fine-tuning with Very Large Dropout Jianyu Zhang Léon Bottou 44 1 0 01 Mar 2024
Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escape, and Network Embedding Zhengqing Wu Berfin Simsek Francois Ged ODL 45 0 0 08 Feb 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization Sungbin Shin Dongyeop Lee Maksym Andriushchenko Namhoon Lee AAML 44 1 0 29 Nov 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization Nuoya Xiong Lijun Ding Simon S. Du 35 11 0 03 Oct 2023
Elephant Neural Networks: Born to Be a Continual Learner Qingfeng Lan A. R. Mahmood CLL 48 9 0 02 Oct 2023
Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss T. Getu Georges Kaddoum M. Bennis 40 1 0 13 Sep 2023
Gradient-Based Feature Learning under Structured Data Alireza Mousavi-Hosseini Denny Wu Taiji Suzuki Murat A. Erdogdu MLT 37 18 0 07 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 48 8 0 07 Sep 2023
Likelihood-ratio-based confidence intervals for neural networks Laurens Sluijterman Eric Cator Tom Heskes UQCV 33 0 0 04 Aug 2023
Fading memory as inductive bias in residual recurrent networks I. Dubinin Felix Effenberger 43 4 0 27 Jul 2023
Quantitative CLTs in Deep Neural Networks Stefano Favaro Boris Hanin Domenico Marinucci I. Nourdin G. Peccati BDL 31 11 0 12 Jul 2023
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions Nishil Patel Sebastian Lee Stefano Sarao Mannelli Sebastian Goldt Adrew Saxe OffRL 28 3 0 17 Jun 2023
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks Puyu Wang Yunwen Lei Di Wang Yiming Ying Ding-Xuan Zhou MLT 29 3 0 26 May 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models Guillermo Ortiz-Jiménez Alessandro Favero P. Frossard MoMe 51 110 0 22 May 2023
How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features Simone Bombari Marco Mondelli AAML 28 4 0 20 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks Eshaan Nichani Alexandru Damian Jason D. Lee MLT 44 13 0 11 May 2023