107
v1v2 (latest)

Closed-Form Beta Distribution Estimation from Sparse Statistics with Random Forest Implicit Regularization

Main:28 Pages
14 Figures
Bibliography:3 Pages
2 Tables
Abstract

This work advances distribution recovery from sparse data and ensemble classification through three main contributions. First, we introduce a closed-form estimator that reconstructs scaled beta distributions from limited statistics (minimum, maximum, mean, and median) via composite quantile and moment matching. The recovered parameters (α,β)(\alpha,\beta), when used as features in Random Forest classifiers, improve pairwise classification on time-series snapshots, validating the fidelity of the recovered distributions. Second, we establish a link between classification accuracy and distributional closeness by deriving error bounds that constrain total variation distance and Jensen-Shannon divergence, the latter exhibiting quadratic convergence. Third, we show that zero-variance features act as an implicit regularizer, increasing selection probability for mid-ranked predictors and producing deeper, more varied trees. A SeatGeek pricing dataset serves as the primary application, illustrating distributional recovery and event-level classification while situating these methods within the structure and dynamics of the secondary ticket marketplace. The UCI handwritten digits dataset confirms the broader regularization effect. Overall, the study outlines a practical route from sparse distributional snapshots to closed-form estimation and improved ensemble accuracy, with reliability enhanced through implicit regularization.

View on arXiv
Comments on this paper