Preconditioned Inexact Stochastic ADMM for Deep Model

15 February 2025

Abstract

The recent advancement of foundation models (FMs) has brought about a paradigm shift, revolutionizing various sectors worldwide. The popular optimizers used to train these models are stochastic gradient descent-based algorithms, which face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. This paper develops an algorithm, PISA ({P}reconditioned {I}nexact {S}tochastic {A}lternating Direction Method of Multipliers), which enables scalable parallel computing and supports various second-moment schemes. Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables PISA to tackle the challenge of data heterogeneity effectively. Comprehensive experimental evaluations for training or fine-tuning diverse FMs, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate its superior numerical performance compared to various state-of-the-art optimizers.

View on arXiv

@article{zhou2025_2502.10784,
  title={ Preconditioned Inexact Stochastic ADMM for Deep Model },
  author={ Shenglong Zhou and Ouya Wang and Ziyan Luo and Yongxu Zhu and Geoffrey Ye Li },
  journal={arXiv preprint arXiv:2502.10784},
  year={ 2025 }
}

Comments on this paper