ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.08624
13
1

Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

12 April 2024
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
    MLT
ArXivPDFHTML
Abstract

We present and analyze a novel regularized form of the gradient clipping algorithm, proving that it converges to global minima of the loss surface of deep neural networks under the squared loss, provided that the layers are of sufficient width. The algorithm presented here, dubbed δ−\delta-δ−GClip, introduces a modification to gradient clipping that leads to a first-of-its-kind example of a step size scheduling for gradient descent that provably minimizes training losses of deep neural nets. We also present empirical evidence that our theoretically founded δ−\delta-δ−GClip algorithm is competitive with the state-of-the-art deep learning heuristics on various neural architectures including modern transformer based architectures. The modification we do to standard gradient clipping is designed to leverage the PL* condition, a variant of the Polyak-Lojasiewicz inequality which was recently proven to be true for sufficiently wide neural networks at any depth within a neighbourhood of the initialization.

View on arXiv
@article{tucat2025_2404.08624,
  title={ Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks },
  author={ Matteo Tucat and Anirbit Mukherjee and Procheta Sen and Mingfei Sun and Omar Rivasplata },
  journal={arXiv preprint arXiv:2404.08624},
  year={ 2025 }
}
Comments on this paper