41
0

Generalized Gradient Norm Clipping & Non-Euclidean (L0,L1)(L_0,L_1)-Smoothness

Main:9 Pages
8 Figures
Bibliography:3 Pages
3 Tables
Appendix:17 Pages
Abstract

This work introduces a hybrid non-Euclidean optimization method which generalizes gradient norm clipping by combining steepest descent and conditional gradient approaches. The method achieves the best of both worlds by establishing a descent property under a generalized notion of (L0L_0,L1L_1)-smoothness. Weight decay is incorporated in a principled manner by identifying a connection to the Frank-Wolfe short step. In the stochastic case, we show an order optimal O(n1/4)O(n^{-1/4}) convergence rate by leveraging a momentum based gradient estimator. We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.

View on arXiv
@article{pethick2025_2506.01913,
  title={ Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness },
  author={ Thomas Pethick and Wanyun Xie and Mete Erdogan and Kimon Antonakopoulos and Tony Silveti-Falls and Volkan Cevher },
  journal={arXiv preprint arXiv:2506.01913},
  year={ 2025 }
}
Comments on this paper