Prediction with Missing Data via Bayesian Additive Regression Trees

We present a method for handling missing data in tree-based machine learning methods. We focus on Bayesian Additive Regression Trees (BART) enhanced with "Missingness Incorporated in Attributes," an approach recently proposed for missingness in decision trees. This approach does not require imputation, but rather takes advantage of the partitioning mechanisms found in tree-based models. Simulations on generated models and real data indicate that our proposed method can figure out complicated missing-at-random and not-missing-at-random models as well as models where missingness itself influences the response. We also illustrate BART's abilities to incorporate missingness into uncertainty and estimates and to detect the influence of missingness on the model fit.
View on arXiv