19
2

High Probability Guarantees for Random Reshuffling

Abstract

We consider the stochastic gradient method with random reshuffling (RR\mathsf{RR}) for tackling smooth nonconvex optimization problems. RR\mathsf{RR} finds broad applications in practice, notably in training neural networks. In this work, we provide high probability first-order and second-order complexity guarantees for this method. First, we establish a high probability first-order sample complexity result for driving the Euclidean norm of the gradient (without taking expectation) below ε\varepsilon. Our derived complexity matches the best existing in-expectation one up to a logarithmic term while imposing no additional assumptions nor changing RR\mathsf{RR}'s updating rule. We then propose a simple and computable stopping criterion for RR\mathsf{RR} (denoted as RR\mathsf{RR}-sc\mathsf{sc}). This criterion is guaranteed to be triggered after a finite number of iterations, enabling us to prove a high probability first-order complexity guarantee for the last iterate. Second, building on the proposed stopping criterion, we design a perturbed random reshuffling method (p\mathsf{p}-RR\mathsf{RR}) that involves an additional randomized perturbation procedure near stationary points. We derive that p\mathsf{p}-RR\mathsf{RR} provably escapes strict saddle points and establish a high probability second-order complexity result, without requiring any sub-Gaussian tail-type assumptions on the stochastic gradient errors. The fundamental ingredient in deriving the aforementioned results is the new concentration property for sampling without replacement in RR\mathsf{RR}, which could be of independent interest. Finally, we conduct numerical experiments on neural network training to support our theoretical findings.

View on arXiv
@article{yu2025_2311.11841,
  title={ High Probability Guarantees for Random Reshuffling },
  author={ Hengxu Yu and Xiao Li },
  journal={arXiv preprint arXiv:2311.11841},
  year={ 2025 }
}
Comments on this paper