Data-generating models under which the random forest algorithm performs
badly
Computational statistics (Zeitschrift) (Comput. Stat.), 2019
Abstract
Examples are given of data-generating models under which some versions of the random forest algorithm may fail to be consistent or at least may be extremely slow to converge to the optimal predictor. Evidence provided for these properties is based on partly intuitive and partly rigorous arguments and on numerical experiments. Although one can always choose a model under which random forests perform very badly, in each case simple methods based on statistics of `variable use' and `variable importance' can be used to construct a better predictor based on a sort of mixture of random forests.
View on arXivComments on this paper
