37
12

The out-of-sample R2R^2: estimation and inference

Abstract

Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample R2R^2, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample R2R^2, the out-of-sample R2R^2 has not been well defined and the variability on the out-of-sample R^2\hat{R}^2 has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample R2R^2 as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the R^2\hat{R}^2. The performance of the estimators for the R2R^2 and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative Brassica napus\text{Brassica napus} and Zea mays\text{Zea mays} phenotypes based on gene expression data.

View on arXiv
Comments on this paper