18
v1v2 (latest)

A Gap Between Decision Trees and Neural Networks

Akash Kumar
Main:19 Pages
10 Figures
Bibliography:3 Pages
Appendix:23 Pages
Abstract

We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation (RTV\mathrm{R}\mathrm{TV}) seminorm, which controls the geometric complexity of level sets.We first show that the hard tree indicator 1A1_A has infinite RTV\mathrm{R}\mathrm{TV}. Moreover, two natural split-wise continuous surrogates--piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing--also have infinite RTV\mathrm{R}\mathrm{TV} in dimensions d>1d>1, while Gaussian convolution yields finite RTV\mathrm{R}\mathrm{TV} but with an explicit exponential dependence on dd.We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to 1A1_A). For classification, we construct a smooth barrier score SAS_A with finite RTV\mathrm{R}\mathrm{TV} whose fixed threshold τ=1\tau=1 exactly recovers the box. Under a mild tube-mass condition near A\partial A, we prove an L1(P)L_1(P) calibration bound that decays polynomially in a sharpness parameter, along with an explicit RTV\mathrm{R}\mathrm{TV} upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy--complexity tradeoff and how threshold selection shifts where training lands along it.

View on arXiv
Comments on this paper