ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.04596
  4. Cited By
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

Annual Conference Computational Learning Theory (COLT), 2020
9 July 2020
Yuanzhi Li
Tengyu Ma
Hongyang R. Zhang
    MLT
ArXiv (abs)PDFHTML

Papers citing "Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK"

21 / 21 papers shown
Title
Feature learning is decoupled from generalization in high capacity neural networks
Feature learning is decoupled from generalization in high capacity neural networks
Niclas Goring
Charles London
Abdurrahman Hadi Erturk
Chris Mingard
Yoonsoo Nam
Ard A. Louis
OODMLT
119
1
0
25 Jul 2025
How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings
How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric EncodingsInternational Conference on Learning Representations (ICLR), 2025
Samuel Audia
Soheil Feizi
Matthias Zwicker
Dinesh Manocha
128
1
0
18 Apr 2025
SGD Finds then Tunes Features in Two-Layer Neural Networks with
  near-Optimal Sample Complexity: A Case Study in the XOR problem
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problemInternational Conference on Learning Representations (ICLR), 2023
Margalit Glasgow
MLT
250
19
0
26 Sep 2023
Why Shallow Networks Struggle to Approximate and Learn High Frequencies
Why Shallow Networks Struggle to Approximate and Learn High Frequencies
Shijun Zhang
Hongkai Zhao
Yimin Zhong
Haomin Zhou
153
9
0
29 Jun 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for
  Learning a Single Neuron
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single NeuronAnnual Conference Computational Learning Theory (COLT), 2023
Weihang Xu
S. Du
215
19
0
20 Feb 2023
Vision Transformers provably learn spatial structure
Vision Transformers provably learn spatial structureNeural Information Processing Systems (NeurIPS), 2022
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViTMLT
143
96
0
13 Oct 2022
Neural Networks can Learn Representations with Gradient Descent
Neural Networks can Learn Representations with Gradient DescentAnnual Conference Computational Learning Theory (COLT), 2022
Alexandru Damian
Jason D. Lee
Mahdi Soltanolkotabi
SSLMLT
201
150
0
30 Jun 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised
  Learning
The Mechanism of Prediction Head in Non-contrastive Self-supervised LearningNeural Information Processing Systems (NeurIPS), 2022
Zixin Wen
Yuanzhi Li
SSL
255
40
0
12 May 2022
Efficiently Learning Any One Hidden Layer ReLU Network From Queries
Efficiently Learning Any One Hidden Layer ReLU Network From Queries
Sitan Chen
Adam R. Klivans
Raghu Meka
MLAUMLT
160
8
0
08 Nov 2021
On the Provable Generalization of Recurrent Neural Networks
On the Provable Generalization of Recurrent Neural Networks
Lifu Wang
Bo Shen
Bo Hu
Xing Cao
271
9
0
29 Sep 2021
Deep Networks Provably Classify Data on Curves
Deep Networks Provably Classify Data on CurvesNeural Information Processing Systems (NeurIPS), 2021
Tingran Wang
Sam Buchanan
D. Gilboa
John N. Wright
171
9
0
29 Jul 2021
Small random initialization is akin to spectral learning: Optimization
  and generalization guarantees for overparameterized low-rank matrix
  reconstruction
Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstructionNeural Information Processing Systems (NeurIPS), 2021
Dominik Stöger
Mahdi Soltanolkotabi
ODL
293
85
0
28 Jun 2021
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks
  Trained by Gradient Descent
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient DescentNeural Information Processing Systems (NeurIPS), 2021
Spencer Frei
Quanquan Gu
162
29
0
25 Jun 2021
Toward Understanding the Feature Learning Process of Self-supervised
  Contrastive Learning
Toward Understanding the Feature Learning Process of Self-supervised Contrastive LearningInternational Conference on Machine Learning (ICML), 2021
Zixin Wen
Yuanzhi Li
SSLMLT
259
148
0
31 May 2021
Why Do Local Methods Solve Nonconvex Problems?
Why Do Local Methods Solve Nonconvex Problems?
Tengyu Ma
78
13
0
24 Mar 2021
Unintended Effects on Adaptive Learning Rate for Training Neural Network
  with Output Scale Change
Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change
Ryuichi Kanoh
M. Sugiyama
78
0
0
05 Mar 2021
A Local Convergence Theory for Mildly Over-Parameterized Two-Layer
  Neural Network
A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural NetworkAnnual Conference Computational Learning Theory (COLT), 2021
Mo Zhou
Rong Ge
Chi Jin
209
50
0
04 Feb 2021
Provable Generalization of SGD-trained Neural Networks of Any Width in
  the Presence of Adversarial Label Noise
Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label NoiseInternational Conference on Machine Learning (ICML), 2021
Spencer Frei
Yuan Cao
Quanquan Gu
FedMLMLT
288
22
0
04 Jan 2021
Towards Understanding Ensemble, Knowledge Distillation and
  Self-Distillation in Deep Learning
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep LearningInternational Conference on Learning Representations (ICLR), 2020
Zeyuan Allen-Zhu
Yuanzhi Li
FedML
414
426
0
17 Dec 2020
A Modular Analysis of Provable Acceleration via Polyak's Momentum:
  Training a Wide ReLU Network and a Deep Linear Network
A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear NetworkInternational Conference on Machine Learning (ICML), 2020
Jun-Kun Wang
Chi-Heng Lin
Jacob D. Abernethy
301
24
0
04 Oct 2020
Feature Purification: How Adversarial Training Performs Robust Deep
  Learning
Feature Purification: How Adversarial Training Performs Robust Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
MLTAAML
246
164
0
20 May 2020
1