56
22

Complete Analysis of a Random Forest Model

Abstract

Random forests have become an important tool for improving accuracy in regression problems since their popularization by [Breiman, 2001] and others. In this paper, we revisit a random forest model originally proposed by [Breiman, 2004] and later studied by [Biau, 2012], where a feature is selected at random and the split occurs at the midpoint of the box containing the chosen feature. If the Lipschitz regression function is sparse and only depends on a small, unknown subset of SS out of dd features, we show that given nn observations, this random forest model outputs a predictor that has a mean-squared prediction error O((n(logn)S1)1Slog2+1)O((n(\sqrt{\log n})^{S-1})^{-\frac{1}{S\log2+1}}). When S0.72dS \leq \lfloor 0.72 d \rfloor, this rate is significantly better than the minimax optimal rate Θ(n2d+2)\Theta(n^{-\frac{2}{d+2}}) for Lipschitz function classes in d d dimensions. The second part of this article shows that the prediction error for this random forest model cannot generally be improved. As a striking consequence of our analysis, we show that if avg\ell_{avg} (resp. max\ell_{max}) is the average (resp. maximum) number of observations per leaf node, then the variance of this forest is Θ(avg1(logn)(S1))\Theta(\ell^{-1}_{avg}(\sqrt{\log n})^{-(S-1)}). When S=dS = d, this variance is similar in form to the best-case variance lower bound Ω(max1(logn)(d1))\Omega(\ell^{-1}_{max}(\log n)^{-(d-1)}) of [Lin and Jeon, 2006] for any random forest model with a nonadaptive splitting scheme (i.e., where the split protocol is independent of the data). We also show that the bias is tight for any linear model with nonzero parameter vector. Finally, a side consequence of our analysis is that if the regression function is square-integrable (e.g., it need not be continuous or bounded), then the random forest predictor is pointwise consistent almost everywhere.

View on arXiv
Comments on this paper