We derive the mean squared error convergence rates of kernel density-based plug-in estimators of mutual information measures between two multidimensional random variables and for two cases: 1) and are both continuous; 2) is continuous and is discrete. Using the derived rates, we propose an ensemble estimator of these information measures for the second case by taking a weighted sum of the plug-in estimators with varied bandwidths. The resulting ensemble estimator achieves the parametric convergence rate when the conditional densities of the continuous variables are sufficiently smooth. To the best of our knowledge, this is the first nonparametric mutual information estimator known to achieve the parametric convergence rate for this case, which frequently arises in applications (e.g. variable selection in classification). The estimator is simple to implement as it uses the solution to an offline convex optimization problem and simple plug-in estimators. A central limit theorem is also derived for the ensemble estimator. Ensemble estimators that achieve the parametric rate are also derived for the first case ( and are both continuous) and another case 3) and may have any mixture of discrete and continuous components.
View on arXiv