40

On Semi-supervised Estimation of Discrete Distributions under f-divergences

International Symposium on Information Theory (ISIT), 2024
Abstract

We study the problem of estimating the joint probability mass function (pmf) over two random variables. In particular, the estimation is based on the observation of mm samples containing both variables and nn samples missing one fixed variable. We adopt the minimax framework with lppl^p_p loss functions. Recent work established that univariate minimax estimator combinations achieve minimax risk with the optimal first-order constant for p2p \ge 2 in the regime m=o(n)m = o(n), questions remained for p2p \le 2 and various ff-divergences. In our study, we affirm that these composite estimators are indeed minimax optimal for lppl^p_p loss functions, specifically for the range 1p21 \le p \le 2, including the critical l1l_1 loss. Additionally, we ascertain their optimality for a suite of ff-divergences, such as KL, χ2\chi^2, Squared Hellinger, and Le Cam divergences.

View on arXiv
Comments on this paper