Minimax Estimation of KL Divergence between Discrete Distributions

We consider the problem of estimating the KL divergence between two discrete probability measures and from empirical data in a non-asymptotic and possibly large alphabet setting. We construct minimax rate-optimal estimators for when the likelihood ratio is upper bounded by a constant which may depend on the support size, and show that the performance of the optimal estimator with samples is essentially that of the Maximum Likelihood Estimator (MLE) with samples. Our estimator is adaptive in the sense that it does not require the knowledge of the support size or the upper bound on the likelihood ratio. Our approach refines the \emph{Approximation} methodology recently developed for the construction of near minimax estimators of functionals of high-dimensional parameters, such as entropy, R\'enyi entropy, mutual information and distance in large alphabet settings, and shows that the \emph{effective sample size enlargement} phenomenon holds significantly more widely than previously established.
View on arXiv