56
0

Extended convexity and smoothness and their applications in deep learning

Abstract

Classical assumptions like strong convexity and Lipschitz smoothness often fail to capture the nature of deep learning optimization problems, which are typically non-convex and non-smooth, making traditional analyses less applicable. This study aims to elucidate the mechanisms of non-convex optimization in deep learning by extending the conventional notions of strong convexity and Lipschitz smoothness. By leveraging these concepts, we prove that, under the established constraints, the empirical risk minimization problem is equivalent to optimizing the local gradient norm and structural error, which together constitute the upper and lower bounds of the empirical risk. Furthermore, our analysis demonstrates that the stochastic gradient descent (SGD) algorithm can effectively minimize the local gradient norm. Additionally, techniques like skip connections, over-parameterization, and random parameter initialization are shown to help control the structural error. Ultimately, we validate the core conclusions of this paper through extensive experiments. Theoretical analysis and experimental results indicate that our findings provide new insights into the mechanisms of non-convex optimization in deep learning.

View on arXiv
@article{qi2025_2410.05807,
  title={ Extended convexity and smoothness and their applications in deep learning },
  author={ Binchuan Qi and Wei Gong and Li Li },
  journal={arXiv preprint arXiv:2410.05807},
  year={ 2025 }
}
Comments on this paper