232

Convergence Rates for Mixture-of-Experts

Abstract

In this paper we study the mixture-of-experts (ME) model with experts in an one-exponential family with mean ϕ(hk)\phi(h_k), where hkh_k is a kthk^{th} order polynomial and ϕ()\phi(\cdot) is the inverse link function. We derive sharp approximation rates with respect to the Kullback-Leibler divergence and convergence rate of the maximum likelihood estimator to densities in an one-parameter exponential family with mean ϕ(h)\phi(h) where h\WKh\in\WK, a Sobolev class with α\alpha derivatives. We found that the convergence rate of the maximum likelihood estimator to the true density is Op(m2[α(k+1)]/s+(mJk+vm)n1logn)O_p(m^{-2[\alpha\wedge(k+1)]/s}+ (mJ_k+v_m) n^{-1}\log n), where nn is the number of observations, ss is the number of covariates, JkJ_k is the number of parameters of the polynomial hkh_k, mm the number of experts and vmv_m is the number of parameters on the weight functions. Further, if the maximum likelihood estimator is uniquely identified we can remove the "logn\log n" term of the convergence rates. We close the paper discussing model specification and the effects on approximation and estimation errors and conclude that the best error bound is achieved using a balance between kk and mm. We also explain how the results in this paper can be extended to more general approximation and target classes of densities.

View on arXiv
Comments on this paper