On the Selection Stability of Stability Selection and Its Applications

14 November 2024

Mahdi Nouraie

Samuel Muller

ArXiv (abs)PDF HTML Github (17★)

Main:16 Pages

6 Figures

Bibliography:3 Pages

Abstract

Stability selection is a widely adopted resampling-based framework for high-dimensional variable selection. This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability selection results, moving beyond single-variable analysis. We suggest that the stability estimator offers two advantages: it can serve as a reference to reflect the robustness of the results obtained, and help identify an optimal regularization value to improve stability. By determining this value, we calibrate key stability selection parameters, namely, the decision threshold and the expected number of falsely selected variables, within established theoretical bounds. The asymptotic distribution of the stability estimator allows us to observe convergence of stability values over successive sub-samples. This approach sheds light on the required number of sub-samples addressing a notable gap in prior studies. Pareto optimality of the proposed regularization value is also discussed. The stabplot R package is developed to facilitate the use of the plots featured in this manuscript, supporting their integration into further statistical analysis and research workflows.

View on arXiv

Comments on this paper