We consider the problem of estimating the distance between two discrete probability measures and from empirical data in a nonasymptotic and large alphabet setting. When is known and one obtains samples from , we show that for every , the minimax rate-optimal estimator with samples achieves performance comparable to that of the maximum likelihood estimator (MLE) with samples. When both and are unknown, we construct minimax rate-optimal estimators whose worst case performance is essentially that of the known case with being uniform, implying that being uniform is essentially the most difficult case. The \emph{effective sample size enlargement} phenomenon, identified in Jiao \emph{et al.} (2015), holds both in the known case for every and the unknown case. However, the construction of optimal estimators for requires new techniques and insights beyond the approximation-based method of functional estimation in Jiao \emph{et al.} (2015).
View on arXiv