469

Suboptimality of penalties proportional to the dimension for model selection in heteroscedastic regression

Abstract

We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is proportional to the dimension of the model, at least for some typical heteroscedastic model selection problems. In particular, Mallows' CpC_p is suboptimal in this framework, as well as any "linear" penalty depending on both the data and their true distribution. On the contrary, optimal model selection is possible in this framework with data-driven penalties such as VV-fold or resampling penalties (Arlot, 2008a,b). Therefore, estimating the "shape" of the penalty from the data is useful, even at the price of a higher computational cost.

View on arXiv
Comments on this paper