$V$ -fold cross-validation and $V$ -fold penalization in least-squares density estimation

22 October 2012

Abstract

This paper studies $V$ -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing $V$ in order to minimize the least-squares risk of the selected estimator. % We first prove a non asymptotic oracle inequality for $V$ -fold cross-validation and its bias-corrected version ( $V$ -fold penalization), with an upper bound decreasing as a function of $V$ . In particular, this result implies $V$ -fold penalization is asymptotically optimal. % Then, we compute the variance of $V$ -fold cross-validation and related criteria, as well as the variance of key quantities for model selection performances. We show these variances depend on $V$ like $1+1/(V-1)$ (at least in some particular cases), suggesting the performances increase much from V=2 to V=5 or 10, and then is almost constant. % Overall, this explains the common advice to take $V=10$ ---at least in our setting and when the computational power is limited---, as confirmed by some simulation experiments.

View on arXiv

Comments on this paper

VVV-fold cross-validation and VVV-fold penalization in least-squares density estimation

$V$ -fold cross-validation and $V$ -fold penalization in least-squares density estimation