Latent class models with covariates express the joint distribution of a multivariate categorical random variable under an assumption of conditional independence, given a covariate-dependent latent class variable. These models are popular in many fields, and current computational procedures for point estimation rely either on multi-step routines, or combine the expectation-maximization (EM) algorithm with Newton-Raphson methods to facilitate the derivations for the maximization steps. Although these algorithms are routinely implemented, the multi-step strategies do not maximize the full-model log-likelihood, whereas the Newton-Raphson steps within the EM algorithm do not provide monotone log-likelihood sequences, thereby leading to routines which may not guarantee reliable maximization. To address these issues, we propose a nested EM algorithm, which relies on a sequence of conditional expectation-maximizations for the regression coefficients associated with the covariate-dependent latent class variables. Leveraging a recent P\`olya-gamma data augmentation for logistic regression, the conditional expectation-maximizations reduce to a set of simple generalized least squares minimization problems, which provide monotone and stable log-likelihood sequences. We discuss performance gains in a real data application, and derive additional routines for regularized regression and Bayesian inference.
View on arXiv