Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.00055
Cited By
Learning time-scales in two-layers neural networks
28 February 2023
Raphael Berthier
Andrea Montanari
Kangjie Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning time-scales in two-layers neural networks"
29 / 29 papers shown
Title
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
Raphael Barboni
Gabriel Peyré
François-Xavier Vialard
MLT
27
0
0
25 Apr 2025
Survey on Algorithms for multi-index models
Joan Bruna
Daniel Hsu
18
0
0
07 Apr 2025
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
A. Laio
Sebastian Goldt
68
7
0
17 Feb 2025
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions
Elisabetta Cornacchia
Dan Mikulincer
Elchanan Mossel
49
0
0
10 Feb 2025
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
32
0
0
13 Nov 2024
Pretrained transformer efficiently learns low-dimensional target functions in-context
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
23
4
0
04 Nov 2024
A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities
Yatin Dandi
Luca Pesce
Hugo Cui
Florent Krzakala
Yue M. Lu
Bruno Loureiro
MLT
30
1
0
24 Oct 2024
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
29
2
0
07 Oct 2024
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
42
2
0
02 Oct 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Alireza Mousavi-Hosseini
Denny Wu
Murat A. Erdogdu
MLT
AI4CE
27
6
0
14 Aug 2024
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
35
1
0
04 Jun 2024
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Jason D. Lee
Kazusato Oko
Taiji Suzuki
Denny Wu
MLT
71
20
0
03 Jun 2024
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
Lorenzo Bardone
Sebastian Goldt
25
7
0
12 Apr 2024
Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent
Pratik Patil
Yuchen Wu
R. Tibshirani
33
4
0
26 Feb 2024
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
Kevin Kögler
A. Shevchenko
Hamed Hassani
Marco Mondelli
MLT
17
0
0
07 Feb 2024
Asymptotics of feature learning in two-layer networks after one gradient-step
Hugo Cui
Luca Pesce
Yatin Dandi
Florent Krzakala
Yue M. Lu
Lenka Zdeborová
Bruno Loureiro
MLT
36
16
0
07 Feb 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim
Taiji Suzuki
11
18
0
02 Feb 2024
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions
Simon Martin
Francis Bach
Giulio Biroli
8
8
0
07 Nov 2023
Should Under-parameterized Student Networks Copy or Average Teacher Weights?
Berfin Simsek
Amire Bendjeddou
W. Gerstner
Johanni Brea
14
6
0
03 Nov 2023
Grokking as the Transition from Lazy to Rich Training Dynamics
Tanishq Kumar
Blake Bordelon
Samuel Gershman
C. Pehlevan
15
26
0
09 Oct 2023
Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini
Denny Wu
Taiji Suzuki
Murat A. Erdogdu
MLT
15
18
0
07 Sep 2023
Six Lectures on Linearized Neural Networks
Theodor Misiakiewicz
Andrea Montanari
21
12
0
25 Aug 2023
On Single Index Models beyond Gaussian Data
Joan Bruna
Loucas Pillaud-Vivien
Aaron Zweig
8
10
0
28 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Bill Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
11
29
0
30 Jun 2023
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
19
25
0
29 May 2023
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD
Luca Arnaboldi
Florent Krzakala
Bruno Loureiro
Ludovic Stephan
MLT
10
3
0
29 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks
Raffaele Marino
F. Ricci-Tersenghi
10
14
0
10 May 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks
Luca Arnaboldi
Ludovic Stephan
Florent Krzakala
Bruno Loureiro
MLT
12
31
0
12 Feb 2023
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
150
65
0
27 Oct 2022
1