Learning time-scales in two-layers neural networks

28 February 2023

Papers citing "Learning time-scales in two-layers neural networks"

29 / 29 papers shown

Title
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime Raphael Barboni Gabriel Peyré François-Xavier Vialard MLT 27 0 0 25 Apr 2025
Survey on Algorithms for multi-index models Joan Bruna Daniel Hsu 18 0 0 07 Apr 2025
A distributional simplicity bias in the learning dynamics of transformers Riccardo Rende Federica Gerace A. Laio Sebastian Goldt 68 7 0 17 Feb 2025
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions Elisabetta Cornacchia Dan Mikulincer Elchanan Mossel 49 0 0 10 Feb 2025
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence Berfin Simsek Amire Bendjeddou Daniel Hsu 32 0 0 13 Nov 2024
Pretrained transformer efficiently learns low-dimensional target functions in-context Kazusato Oko Yujin Song Taiji Suzuki Denny Wu 23 4 0 04 Nov 2024
A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities Yatin Dandi Luca Pesce Hugo Cui Florent Krzakala Yue M. Lu Bruno Loureiro MLT 30 1 0 24 Oct 2024
Task Diversity Shortens the ICL Plateau Jaeyeon Kim Sehyun Kwon Joo Young Choi Jongho Park Jaewoong Cho Jason D. Lee Ernest K. Ryu MoMe 29 2 0 07 Oct 2024
Attention layers provably solve single-location regression P. Marion Raphael Berthier Gérard Biau Claire Boyer 42 2 0 02 Oct 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics Alireza Mousavi-Hosseini Denny Wu Murat A. Erdogdu MLT AI4CE 27 6 0 14 Aug 2024
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs Luca Arnaboldi Yatin Dandi Florent Krzakala Bruno Loureiro Luca Pesce Ludovic Stephan 35 1 0 04 Jun 2024
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit Jason D. Lee Kazusato Oko Taiji Suzuki Denny Wu MLT 71 20 0 03 Jun 2024
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks Lorenzo Bardone Sebastian Goldt 25 7 0 12 Apr 2024
Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent Pratik Patil Yuchen Wu R. Tibshirani 33 4 0 26 Feb 2024
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth Kevin Kögler A. Shevchenko Hamed Hassani Marco Mondelli MLT 17 0 0 07 Feb 2024
Asymptotics of feature learning in two-layer networks after one gradient-step Hugo Cui Luca Pesce Yatin Dandi Florent Krzakala Yue M. Lu Lenka Zdeborová Bruno Loureiro MLT 36 16 0 07 Feb 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape Juno Kim Taiji Suzuki 11 18 0 02 Feb 2024
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions Simon Martin Francis Bach Giulio Biroli 8 8 0 07 Nov 2023
Should Under-parameterized Student Networks Copy or Average Teacher Weights? Berfin Simsek Amire Bendjeddou W. Gerstner Johanni Brea 14 6 0 03 Nov 2023
Grokking as the Transition from Lazy to Rich Training Dynamics Tanishq Kumar Blake Bordelon Samuel Gershman C. Pehlevan 15 26 0 09 Oct 2023
Gradient-Based Feature Learning under Structured Data Alireza Mousavi-Hosseini Denny Wu Taiji Suzuki Murat A. Erdogdu MLT 15 18 0 07 Sep 2023
Six Lectures on Linearized Neural Networks Theodor Misiakiewicz Andrea Montanari 21 12 0 25 Aug 2023
On Single Index Models beyond Gaussian Data Joan Bruna Loucas Pillaud-Vivien Aaron Zweig 8 10 0 28 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit Lorenzo Noci Chuning Li Mufan Bill Li Bobby He Thomas Hofmann Chris J. Maddison Daniel M. Roy 11 29 0 30 Jun 2023
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time Yatin Dandi Florent Krzakala Bruno Loureiro Luca Pesce Ludovic Stephan MLT 19 25 0 29 May 2023
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD Luca Arnaboldi Florent Krzakala Bruno Loureiro Ludovic Stephan MLT 10 3 0 29 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks Raffaele Marino F. Ricci-Tersenghi 10 14 0 10 May 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks Luca Arnaboldi Ludovic Stephan Florent Krzakala Bruno Loureiro MLT 12 31 0 12 Feb 2023
Learning Single-Index Models with Shallow Neural Networks A. Bietti Joan Bruna Clayton Sanford M. Song 150 65 0 27 Oct 2022