ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.10781
145
1

Generalization Properties of Decision Trees on Real-valued and Categorical Features

18 October 2022
Jean-Samuel Leboeuf
F. Leblanc
M. Marchand
ArXiv (abs)PDFHTML
Abstract

We revisit binary decision trees from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. We consider three types of features: real-valued, categorical ordinal and categorical nominal, with different split rules for each. For each feature type, we upper bound the partitioning function of the class of decision stumps before extending the bounds to the class of general decision tree (of any fixed structure) using a recursive approach. Using these new results, we are able to find the exact VC dimension of decision stumps on examples of ℓ\ellℓ real-valued features, which is given by the largest integer ddd such that 2ℓ≥(d⌊d2⌋)2\ell \ge \binom{d}{\lfloor\frac{d}{2}\rfloor}2ℓ≥(⌊2d​⌋d​). Furthermore, we show that the VC dimension of a binary tree structure with LTL_TLT​ leaves on examples of ℓ\ellℓ real-valued features is in O(LTlog⁡(LTℓ))O(L_T \log(L_T\ell))O(LT​log(LT​ℓ)). Finally, we elaborate a pruning algorithm based on these results that performs better than the cost-complexity and reduced-error pruning algorithms on a number of data sets, with the advantage that no cross-validation is required.

View on arXiv
Comments on this paper