231
v1v2 (latest)

Convergence Rates for Mixture-of-Experts

Abstract

In mixtures-of-experts (ME) model, where a number of submodels (experts) are combined, there have been two longstanding problems: (i) how many experts should be chosen, given the size of the training data? (ii) given the total number of parameters, is it better to use a few very complex experts, or is it better to combine many simple experts? In this paper, we try to provide some insights to these problems through a theoretic study on a ME structure where mm experts are mixed, with each expert being related to a polynomial regression model of order kk. We study the convergence rate of the maximum likelihood estimator (MLE), in terms of how fast the Kullback-Leibler divergence of the estimated density converges to the true density, when the sample size nn increases. The convergence rate is found to be dependent on both mm and kk, and certain choices of mm and kk are found to produce optimal convergence rates. Therefore, these results shed light on the two aforementioned important problems: on how to choose mm, and on how mm and kk should be compromised, for achieving good convergence rates.

View on arXiv
Comments on this paper