A Spectral Algorithm for Learning Hidden Markov Models

26 November 2008

Tong Zhang

Abstract

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. Typically, they are learned using search heuristics (such as the Baum-Welch / EM algorithm), which suffer from the usual local optima issues. While in general these models are known to be hard to learn with samples from the underlying distribution, we provide the first provably efficient algorithm (in terms of sample and computational complexity) for learning HMMs under a natural separation condition. This condition is roughly analogous to the separation conditions considered for learning mixture distributions (where, similarly, these models are hard to learn in general). Furthermore, our sample complexity results do not explicitly depend on the number of distinct (discrete) observations -- they implicitly depend on this number through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. Finally, the algorithm is particularly simple, relying only on a singular value decomposition and matrix multiplications.

View on arXiv

Comments on this paper