Convergence Rates for Mixture-of-Experts
- MoMeMoE
In this paper we study the mixture-of-experts (ME) model with experts in an one-exponential family with mean , where is a order polynomial and is the inverse link function. We derive sharp approximation rates with respect to the Kullback-Leibler divergence and convergence rate of the maximum likelihood estimator to densities in an one-parameter exponential family with mean where , a Sobolev class with derivatives. We found that the convergence rate of the maximum likelihood estimator to the true density is , where is the number of observations, is the number of covariates, is the number of parameters of the polynomial , the number of experts and is the number of parameters on the weight functions. Further, if the maximum likelihood estimator is uniquely identified we can remove the "" term of the convergence rates. We close the paper discussing model specification and the effects on approximation and estimation errors and conclude that the best error bound is achieved using a balance between and . We also explain how the results in this paper can be extended to more general approximation and target classes of densities.
View on arXiv