Some asymptotic results of survival tree and forest models

30 July 2017

Abstract

As one of the most popular extensions of random forests, survival forest models lack established theoretical results and a unified theoretical framework. We first investigate the method from the aspect of splitting rules, where the survival curves of the two potential child nodes are calculated and compared. We show that existing approaches lead to a potentially biased estimation of the within-node survival, and causes non-optimal selection of the splitting rules. This bias is due to the censoring distribution and the non-i.i.d. samples within each node. Based on this observation, we develop the adaptive concentration bound result for both tree and forest versions of the survival tree models. The results quantify the variance part of a survival forest models. Furthermore, we show with three particular examples that consistency results can be obtained. Specifically, the three cases are: 1) a finite dimensional setting with random splitting rules; 2) an infinite dimensional case with marginal signal checking; and 3) an infinite dimensional setting with principled Cox screening splitting rule. The development of these results serves as a general framework for showing the consistency of tree- and forest-based survival models.

View on arXiv

Comments on this paper