63

What Functions Does XGBoost Learn?

Dohyeong Ki
Adityanand Guntuboyina
Main:16 Pages
Bibliography:3 Pages
Appendix:44 Pages
Abstract

This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class FSTd,s\mathcal{F}^{d, s}_{\infty-\text{ST}} that extends finite ensembles of bounded-depth regression trees, together with a complexity measure VXGBd,s()V^{d, s}_{\infty-\text{XGB}}(\cdot) that generalizes the L1L^1 regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over FSTd,s\mathcal{F}^{d, s}_{\infty-\text{ST}} with penalty VXGBd,s()V^{d, s}_{\infty-\text{XGB}}(\cdot), providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of FSTd,s\mathcal{F}^{d, s}_{\infty-\text{ST}} and VXGBd,s()V^{d, s}_{\infty-\text{XGB}}(\cdot) in terms of Hardy--Krause variation. We prove that the least squares estimator over {fFSTd,s:VXGBd,s(f)V}\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\} achieves a nearly minimax-optimal rate of convergence n2/3(logn)4(min(s,d)1)/3n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.

View on arXiv
Comments on this paper