Analyzing CART

24 June 2019

Abstract

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART. In doing so, we find that the training error is governed by Pearson's correlation between the optimal decision stump and response data in each node, which we bound by solving a quadratic program. We leverage this to show that CART with cost-complexity pruning achieves a good bias-variance tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the local dimensionality and structure of the regression surface, are seen to govern the rates of convergence of the prediction error.

View on arXiv

Comments on this paper