We consider supervised learning with random decision trees, where the tree construction is completely random. The method was used as a heuristic working well in practice despite the simplicity of the setting, but with almost no theoretical guarantees. The goal of this paper is to shed new light on the entire paradigm. We provide strong theoretical guarantees regarding learning with random decision trees. We present and compare three different variants of the algorithm that have minimal memory requirements: majority voting, threshold averaging and probabilistic averaging. The random structure of the tree enables us to adapt our setting to the differentially-private scenario thus we also propose differentially-private versions of all three schemes. We give upper bounds on the generalization error and mathematically explain how the accuracy depends on the number of random decision trees. Furthermore, we prove that only logarithmic number of independently selected random decision trees suffice to correctly classify most of the data, even when differential-privacy guarantees must be maintained. Such an analysis has never been done before. We empirically show that majority voting and threshold averaging give the best accuracy, also for conservative users requiring high privacy guarantees. In particular, a simple majority voting rule, that was not considered before in the context of differentially-private learning, is an especially good candidate for the differentially-private classifier since it is much less sensitive to the choice of forest parameters than other methods.
View on arXiv