RandALO: Out-of-sample risk estimation in no time flat

15 September 2024

Abstract

Estimating out-of-sample risk for models trained on large high-dimensional datasets is an expensive but essential part of the machine learning process, enabling practitioners to optimally tune hyperparameters. Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias ( $K$ -fold CV) for computational cost (leave-one-out CV). We propose a randomized approximate leave-one-out (RandALO) risk estimator that is not only a consistent estimator of risk in high dimensions but also less computationally expensive than $K$ -fold CV. We support our claims with extensive simulations on synthetic and real data and provide a user-friendly Python package implementing RandALO available on PyPI as randalo and atthis https URL.

View on arXiv

@article{nobel2025_2409.09781,
  title={ RandALO: Out-of-sample risk estimation in no time flat },
  author={ Parth Nobel and Daniel LeJeune and Emmanuel J. Candès },
  journal={arXiv preprint arXiv:2409.09781},
  year={ 2025 }
}

Comments on this paper