46
0

Curse of Dimensionality in Neural Network Optimization

Abstract

The curse of dimensionality in neural network optimization under the mean-field regime is studied. It is demonstrated that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is rr times continuously differentiable on [0,1]d[0,1]^d, the population risk may not decay at a rate faster than t4rd2rt^{-\frac{4r}{d-2r}}, where tt is an analog of the total number of optimization iterations. This result highlights the presence of the curse of dimensionality in the optimization computation required to achieve a desired accuracy. Instead of analyzing parameter evolution directly, the training dynamics are examined through the evolution of the parameter distribution under the 2-Wasserstein gradient flow. Furthermore, it is established that the curse of dimensionality persists when a locally Lipschitz continuous activation function is employed, where the Lipschitz constant in [x,x][-x,x] is bounded by O(xδ)O(x^\delta) for any xRx \in \mathbb{R}. In this scenario, the population risk is shown to decay at a rate no faster than t(4+2δ)rd2rt^{-\frac{(4+2\delta)r}{d-2r}}. To the best of our knowledge, this work is the first to analyze the impact of function smoothness on the curse of dimensionality in neural network optimization theory.

View on arXiv
@article{na2025_2502.05360,
  title={ Curse of Dimensionality in Neural Network Optimization },
  author={ Sanghoon Na and Haizhao Yang },
  journal={arXiv preprint arXiv:2502.05360},
  year={ 2025 }
}
Comments on this paper