ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.00055
  4. Cited By
Learning time-scales in two-layers neural networks

Learning time-scales in two-layers neural networks

28 February 2023
Raphael Berthier
Andrea Montanari
Kangjie Zhou
ArXivPDFHTML

Papers citing "Learning time-scales in two-layers neural networks"

29 / 29 papers shown
Title
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
Raphael Barboni
Gabriel Peyré
François-Xavier Vialard
MLT
27
0
0
25 Apr 2025
Survey on Algorithms for multi-index models
Survey on Algorithms for multi-index models
Joan Bruna
Daniel Hsu
18
0
0
07 Apr 2025
A distributional simplicity bias in the learning dynamics of transformers
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
A. Laio
Sebastian Goldt
68
7
0
17 Feb 2025
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions
Elisabetta Cornacchia
Dan Mikulincer
Elchanan Mossel
51
0
0
10 Feb 2025
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
32
0
0
13 Nov 2024
Pretrained transformer efficiently learns low-dimensional target
  functions in-context
Pretrained transformer efficiently learns low-dimensional target functions in-context
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
25
4
0
04 Nov 2024
A Random Matrix Theory Perspective on the Spectrum of Learned Features
  and Asymptotic Generalization Capabilities
A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities
Yatin Dandi
Luca Pesce
Hugo Cui
Florent Krzakala
Yue M. Lu
Bruno Loureiro
MLT
32
1
0
24 Oct 2024
Task Diversity Shortens the ICL Plateau
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
29
2
0
07 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
42
2
0
02 Oct 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Alireza Mousavi-Hosseini
Denny Wu
Murat A. Erdogdu
MLT
AI4CE
27
6
0
14 Aug 2024
Online Learning and Information Exponents: On The Importance of Batch
  size, and Time/Complexity Tradeoffs
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
37
1
0
04 Jun 2024
Neural network learns low-dimensional polynomials with SGD near the
  information-theoretic limit
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Jason D. Lee
Kazusato Oko
Taiji Suzuki
Denny Wu
MLT
78
20
0
03 Jun 2024
Sliding down the stairs: how correlated latent variables accelerate
  learning with neural networks
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
Lorenzo Bardone
Sebastian Goldt
27
7
0
12 Apr 2024
Failures and Successes of Cross-Validation for Early-Stopped Gradient
  Descent
Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent
Pratik Patil
Yuchen Wu
R. Tibshirani
33
4
0
26 Feb 2024
Compression of Structured Data with Autoencoders: Provable Benefit of
  Nonlinearities and Depth
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
Kevin Kögler
A. Shevchenko
Hamed Hassani
Marco Mondelli
MLT
19
0
0
07 Feb 2024
Asymptotics of feature learning in two-layer networks after one
  gradient-step
Asymptotics of feature learning in two-layer networks after one gradient-step
Hugo Cui
Luca Pesce
Yatin Dandi
Florent Krzakala
Yue M. Lu
Lenka Zdeborová
Bruno Loureiro
MLT
38
16
0
07 Feb 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field
  Dynamics on the Attention Landscape
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim
Taiji Suzuki
13
18
0
02 Feb 2024
On the Impact of Overparameterization on the Training of a Shallow
  Neural Network in High Dimensions
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions
Simon Martin
Francis Bach
Giulio Biroli
10
8
0
07 Nov 2023
Should Under-parameterized Student Networks Copy or Average Teacher
  Weights?
Should Under-parameterized Student Networks Copy or Average Teacher Weights?
Berfin Simsek
Amire Bendjeddou
W. Gerstner
Johanni Brea
16
6
0
03 Nov 2023
Grokking as the Transition from Lazy to Rich Training Dynamics
Grokking as the Transition from Lazy to Rich Training Dynamics
Tanishq Kumar
Blake Bordelon
Samuel Gershman
C. Pehlevan
17
31
0
09 Oct 2023
Gradient-Based Feature Learning under Structured Data
Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini
Denny Wu
Taiji Suzuki
Murat A. Erdogdu
MLT
21
18
0
07 Sep 2023
Six Lectures on Linearized Neural Networks
Six Lectures on Linearized Neural Networks
Theodor Misiakiewicz
Andrea Montanari
23
12
0
25 Aug 2023
On Single Index Models beyond Gaussian Data
On Single Index Models beyond Gaussian Data
Joan Bruna
Loucas Pillaud-Vivien
Aaron Zweig
8
10
0
28 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width
  Limit
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Bill Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
11
29
0
30 Jun 2023
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
21
25
0
29 May 2023
Escaping mediocrity: how two-layer networks learn hard generalized
  linear models with SGD
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD
Luca Arnaboldi
Florent Krzakala
Bruno Loureiro
Ludovic Stephan
MLT
17
3
0
29 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer
  neural networks
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks
Raffaele Marino
F. Ricci-Tersenghi
12
14
0
10 May 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A
  unifying approach to SGD in two-layers networks
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks
Luca Arnaboldi
Ludovic Stephan
Florent Krzakala
Bruno Loureiro
MLT
17
31
0
12 Feb 2023
Learning Single-Index Models with Shallow Neural Networks
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
150
65
0
27 Oct 2022
1