59
29
v1v2v3 (latest)

The Broad Optimality of Profile Maximum Likelihood

Abstract

We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing. We establish the profile maximum likelihood (PML) estimator as the first unified sample-optimal approach to a wide range of learning tasks. In particular, for every alphabet size kk and desired accuracy ε\varepsilon: Distribution estimation\textbf{Distribution estimation} Under 1\ell_1 distance, PML yields optimal Θ(k/(ε2logk))\Theta(k/(\varepsilon^2\log k)) sample complexity for sorted-distribution estimation, and a PML-based estimator empirically outperforms the Good-Turing estimator on the actual distribution; Additive property estimation\textbf{Additive property estimation} For a broad class of additive properties, the PML plug-in estimator uses just four times the sample size required by the best estimator to achieve roughly twice its error, with exponentially higher confidence; α-Rnyi entropy estimation\boldsymbol{\alpha}\textbf{-R\ényi entropy estimation} For integer α>1\alpha>1, the PML plug-in estimator has optimal k11/αk^{1-1/\alpha} sample complexity; for non-integer α>3/4\alpha>3/4, the PML plug-in estimator has sample complexity lower than the state of the art; Identity testing\textbf{Identity testing} In testing whether an unknown distribution is equal to or at least ε\varepsilon far from a given distribution in 1\ell_1 distance, a PML-based tester achieves the optimal sample complexity up to logarithmic factors of kk. Most of these results also hold for a near-linear-time computable variant of PML. Stronger results hold for a different and novel variant called truncated PML (TPML).

View on arXiv
Comments on this paper