Fast and General Model Selection using Data Depth and Resampling

We propose a technique using data-depth functions and resampling to simultaneously assign a score, called an \textit{e-value}, to statistical models and conduct inference using that model, in a very general framework. The \textit{e-value} may be used to select models, and we establish that under general conditions, it can separate statistical models that adequately explain properties of the data from those that do not. Our resampling-based approach achieves concurrent ranking of models and consistent approximation of sampling distribution of parameter estimators under any model, thus enabling inference within each model. Consequently, our proposal is one of simultaneous model discovery and inference. This results in a fast and parallel algorithm that fits only a single model and evaluates models, as opposed to the traditional requirement of fitting and evaluating models. We illustrate in simulation experiments that our proposed method typically performs better than or competitively with currently used methods for model selection, in linear models and fixed effect selection in linear mixed models. As a real data application, we use our procedure to elicit climatic drivers of Indian summer monsoon precipitation.
View on arXiv