ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.12241
18
3

Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis

19 February 2024
Semih Cayci
A. Eryilmaz
ArXivPDFHTML
Abstract

We analyze recurrent neural networks trained with gradient descent in the supervised learning setting for dynamical systems, and prove that gradient descent can achieve optimality \emph{without} massive overparameterization. Our in-depth nonasymptotic analysis (i) provides sharp bounds on the network size mmm and iteration complexity τ\tauτ in terms of the sequence length TTT, sample size nnn and ambient dimension ddd, and (ii) identifies the significant impact of long-term dependencies in the dynamical system on the convergence and network width bounds characterized by a cutoff point that depends on the Lipschitz continuity of the activation function. Remarkably, this analysis reveals that an appropriately-initialized recurrent neural network trained with nnn samples can achieve optimality with a network size mmm that scales only logarithmically with nnn. This sharply contrasts with the prior works that require high-order polynomial dependency of mmm on nnn to establish strong regularity conditions. Our results are based on an explicit characterization of the class of dynamical systems that can be approximated and learned by recurrent neural networks via norm-constrained transportation mappings, and establishing local smoothness properties of the hidden state with respect to the learnable parameters.

View on arXiv
Comments on this paper