51
0
v1v2 (latest)

TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm

Abstract

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an 1\ell_1 distance essentially at most a constant times larger than its closest tt-piece degree-dd polynomial, where t1t\ge1 and d0d\ge0. Letting ct,dc_{t,d} denote the smallest such factor, clearly c1,0=1c_{1,0}=1, and it can be shown that ct,d2c_{t,d}\ge 2 for all other tt and dd. Yet current computationally efficient algorithms show only ct,12.25c_{t,1}\le 2.25 and the bound rises quickly to ct,d3c_{t,d}\le 3 for d9d\ge 9. We derive a near-linear-time and essentially sample-optimal estimator that establishes ct,d=2c_{t,d}=2 for all (t,d)(1,0)(t,d)\ne(1,0). Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

View on arXiv
Comments on this paper