High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

3 May 2022

Jimmy Ba

Papers citing "High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation"

50 / 98 papers shown

Title
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures Francesco Cagnetta Alessandro Favero Antonio Sclocchi M. Wyart 21 0 0 11 May 2025
Survey on Algorithms for multi-index models Joan Bruna Daniel Hsu 18 0 0 07 Apr 2025
Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions Fabiola Ricci Lorenzo Bardone Sebastian Goldt OOD 33 0 0 31 Mar 2025
Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry Chi-Ning Chou Hang Le Yichen Wang SueYeon Chung 44 0 0 23 Mar 2025
On the Cone Effect in the Learning Dynamics Zhanpeng Zhou Yongyi Yang Jie Ren Mahito Sugiyama Junchi Yan 46 0 0 20 Mar 2025
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective Alireza Mousavi-Hosseini Clayton Sanford Denny Wu Murat A. Erdogdu 43 0 0 14 Mar 2025
Asymptotic Analysis of Two-Layer Neural Networks after One Gradient Step under Gaussian Mixtures Data with Structure Samet Demir Zafer Dogan MLT 34 0 0 02 Mar 2025
A distributional simplicity bias in the learning dynamics of transformers Riccardo Rende Federica Gerace A. Laio Sebastian Goldt 68 8 0 17 Feb 2025
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions Elisabetta Cornacchia Dan Mikulincer Elchanan Mossel 54 0 0 10 Feb 2025
Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble Atsushi Nitanda Anzelle Lee Damian Tan Xing Kai Mizuki Sakaguchi Taiji Suzuki AI4CE 53 1 0 09 Feb 2025
The Complexity of Learning Sparse Superposed Features with Feedback Akash Kumar 92 0 0 08 Feb 2025
Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization Simone Bombari Marco Mondelli 113 0 0 03 Feb 2025
Optimal Exact Recovery in Semi-Supervised Learning: A Study of Spectral Methods and Graph Convolutional Networks Hai-Xiao Wang Zhichao Wang 71 1 0 18 Dec 2024
On the Efficiency of ERM in Feature Learning Ayoub El Hanchi Chris J. Maddison Murat A. Erdogdu 62 0 0 18 Nov 2024
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence Berfin Simsek Amire Bendjeddou Daniel Hsu 36 0 0 13 Nov 2024
Pretrained transformer efficiently learns low-dimensional target functions in-context Kazusato Oko Yujin Song Taiji Suzuki Denny Wu 34 4 0 04 Nov 2024
On the phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance Jean Barbier Francesco Camilli Justin Ko Koki Okajima 21 5 0 04 Nov 2024
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers Gavia Gray Aman Tiwari Shane Bergsma Joel Hestness 25 1 0 01 Nov 2024
A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities Yatin Dandi Luca Pesce Hugo Cui Florent Krzakala Yue M. Lu Bruno Loureiro MLT 35 1 0 24 Oct 2024
Robust Feature Learning for Multi-Index Models in High Dimensions Alireza Mousavi-Hosseini Adel Javanmard Murat A. Erdogdu OOD AAML 42 1 0 21 Oct 2024
Generalization for Least Squares Regression With Simple Spiked Covariances Jiping Li Rishi Sonthalia 23 0 0 17 Oct 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods Hossein Taheri Christos Thrampoulidis Arya Mazumdar MLT 31 0 0 13 Oct 2024
Task Diversity Shortens the ICL Plateau Jaeyeon Kim Sehyun Kwon Joo Young Choi Jongho Park Jaewoong Cho Jason D. Lee Ernest K. Ryu MoMe 29 2 0 07 Oct 2024
Random Features Outperform Linear Models: Effect of Strong Input-Label Correlation in Spiked Covariance Data Samet Demir Zafer Dogan 28 1 0 30 Sep 2024
How Feature Learning Can Improve Neural Scaling Laws Blake Bordelon Alexander B. Atanasov C. Pehlevan 49 12 0 26 Sep 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks Clémentine Dominé Nicolas Anguita A. Proca Lukas Braun D. Kunin P. Mediano Andrew M. Saxe 30 3 0 22 Sep 2024
Improving Adaptivity via Over-Parameterization in Sequence Models Yicheng Li Qian Lin 14 1 0 02 Sep 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics Alireza Mousavi-Hosseini Denny Wu Murat A. Erdogdu MLT AI4CE 27 6 0 14 Aug 2024
On the Generalization of Preference Learning with DPO Shawn Im Yixuan Li 44 1 0 06 Aug 2024
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning D. Kunin Allan Raventós Clémentine Dominé Feng Chen David Klindt Andrew M. Saxe Surya Ganguli MLT 35 15 0 10 Jun 2024
Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise Vignesh Kothapalli Tianyu Pang Shenyang Deng Zongmin Liu Yaoqing Yang 29 3 0 07 Jun 2024
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs Luca Arnaboldi Yatin Dandi Florent Krzakala Bruno Loureiro Luca Pesce Ludovic Stephan 43 1 0 04 Jun 2024
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit Jason D. Lee Kazusato Oko Taiji Suzuki Denny Wu MLT 87 21 0 03 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training Bobby He Lorenzo Noci Daniele Paliotta Imanol Schlag Thomas Hofmann 34 3 0 29 May 2024
Signal-Plus-Noise Decomposition of Nonlinear Spiked Random Matrix Models Behrad Moniri Hamed Hassani 29 0 0 28 May 2024
Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data Nikita Tsoy Nikola Konstantinov 32 4 0 27 May 2024
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions Luca Arnaboldi Yatin Dandi Florent Krzakala Luca Pesce Ludovic Stephan 61 11 0 24 May 2024
Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis Yufan Li Subhabrata Sen Ben Adlam MLT 31 1 0 18 Apr 2024
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks Lorenzo Bardone Sebastian Goldt 30 7 0 12 Apr 2024
Understanding the Learning Dynamics of Alignment with Human Feedback Shawn Im Yixuan Li ALM 24 11 0 27 Mar 2024
Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective Shokichi Takakura Taiji Suzuki MLT 17 5 0 22 Mar 2024
Generalization of Scaled Deep ResNets in the Mean-Field Regime Yihang Chen Fanghui Liu Yiping Lu Grigorios G. Chrysos V. Cevher 28 2 0 14 Mar 2024
Towards a theory of model distillation Enric Boix-Adserà FedML VLM 44 6 0 14 Mar 2024
Fundamental limits of Non-Linear Low-Rank Matrix Estimation Pierre Mergny Justin Ko Florent Krzakala Lenka Zdeborová 18 1 0 07 Mar 2024
Asymptotics of Learning with Deep Structured (Random) Features Dominik Schröder Daniil Dmitriev Hugo Cui Bruno Loureiro 40 6 0 21 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Benjamin L. Edelman Ezra Edelman Surbhi Goel Eran Malach Nikolaos Tsilivis BDL 21 39 0 16 Feb 2024
Feature learning as alignment: a structural property of gradient descent in non-linear neural networks Daniel Beaglehole Ioannis Mitliagkas Atish Agarwala MLT 34 2 0 07 Feb 2024
Asymptotics of feature learning in two-layer networks after one gradient-step Hugo Cui Luca Pesce Yatin Dandi Florent Krzakala Yue M. Lu Lenka Zdeborová Bruno Loureiro MLT 44 16 0 07 Feb 2024
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents Yatin Dandi Emanuele Troiani Luca Arnaboldi Luca Pesce Lenka Zdeborová Florent Krzakala MLT 59 25 0 05 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features Simone Bombari Marco Mondelli 26 3 0 05 Feb 2024