Convergence of Shallow ReLU Networks on Weakly Interacting Data

24 February 2025

Abstract

We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $\frac{1}{n}$ . When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $\frac{1}{n}$ and $\frac{1}{\sqrt{n}}$ , and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $\frac{1}{\log(n)}$ .

View on arXiv

@article{dana2025_2502.16977,
  title={ Convergence of Shallow ReLU Networks on Weakly Interacting Data },
  author={ Léo Dana and Francis Bach and Loucas Pillaud-Vivien },
  journal={arXiv preprint arXiv:2502.16977},
  year={ 2025 }
}

Comments on this paper