Ensemble Estimation of Mutual Information

27 January 2017

Kevin R. Moon

Abstract

We derive the mean squared error convergence rates of kernel density-based plug-in estimators of mutual information measures between two multidimensional random variables $\mathbf{X}$ and $\mathbf{Y}$ for two cases: 1) $\mathbf{X}$ and $\mathbf{Y}$ are both continuous; 2) $\mathbf{X}$ is continuous and $\mathbf{Y}$ is discrete. Using the derived rates, we propose an ensemble estimator of these information measures for the second case by taking a weighted sum of the plug-in estimators with varied bandwidths. The resulting ensemble estimator achieves the $1/N$ parametric convergence rate when the conditional densities of the continuous variables are sufficiently smooth. To the best of our knowledge, this is the first nonparametric mutual information estimator known to achieve the parametric convergence rate for this case, which frequently arises in applications (e.g. variable selection in classification). The estimator is simple to implement as it uses the solution to an offline convex optimization problem and simple plug-in estimators. A central limit theorem is also derived for the ensemble estimator. Ensemble estimators that achieve the parametric rate are also derived for the first case ( $\mathbf{X}$ and $\mathbf{Y}$ are both continuous) and another case 3) $\mathbf{X}$ and $\mathbf{Y}$ may have any mixture of discrete and continuous components.

View on arXiv

Comments on this paper