ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.01832
215
9
v1v2v3 (latest)

Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks

5 July 2021
Xin Liu
Zhisong Pan
ArXiv (abs)PDFHTML
Abstract

Despite the empirical success of deep learning, it still lacks theoretical understandings to explain why randomly initialized neural network trained by first-order optimization methods is able to achieve zero training loss, even though its landscape is non-convex and non-smooth. Recently, there are some works to demystifies this phenomenon under over-parameterized regime. In this work, we make further progress on this area by considering a commonly used momentum optimization algorithm: Nesterov accelerated method (NAG). We analyze the convergence of NAG for two-layer fully connected neural network with ReLU activation. Specifically, we prove that the error of NAG converges to zero at a linear convergence rate 1−Θ(1/κ)1-\Theta(1/\sqrt{\kappa})1−Θ(1/κ​), where κ>1\kappa > 1κ>1 is determined by the initialization and the architecture of neural network. Comparing to the rate 1−Θ(1/κ)1-\Theta(1/\kappa)1−Θ(1/κ) of gradient descent, NAG achieves an acceleration. Besides, it also validates NAG and Heavy-ball method can achieve a similar convergence rate.

View on arXiv
Comments on this paper