General model discovery using statistical evaluation maps

On any given dataset, a very wide variety of statistical models may be applicable, based on experience and tradition, robustness and sensitivity requirements, algorithmic, computational or philosophical considerations, risk and eventual usage. We propose a technique to compare such models in a very general framework. We establish that under general conditions, statistical models that adequately explain properties of the data can be well separated from those that do not. Our resampling-based approach achieves concurrent ranking of models and consistent approximation of sampling distribution of parameter estimators under any model, thus enabling inference within each model. Consequently, our proposal is one of simultaneous model discovery and inference. For traditional covariate selection problems where there are covariates, our proposal results in a fast and parallel algorithm that fits only a single model and evaluates models, as opposed to the traditional requirement of fitting and evaluating models. We illustrate in simulation experiments that our proposed method typically performs better than or competitively with currently used methods for model selection. We use our procedure to elicit climatic drivers of Indian monsoon precipitation.
View on arXiv