Prediction with a Short Memory
- AI4TS
We consider the problem of predicting the next observation given a sequence of past observations. We show that for any distribution over observations, if the mutual information between past observations and future observations is upper bounded by , then a simple Markov model over the most recent observations obtains expected KL error --and hence error --with respect to the optimal predictor that has access to the entire past. For a Hidden Markov Model with states, is bounded by , a quantity that does not depend on the mixing time. We also establish that this result cannot be improved upon, in the following senses: First, a window length of is information-theoretically necessary for expected KL error , or error . Second, the samples required to accurately estimate the Markov model when observations are drawn from an alphabet of size is necessary for any computationally tractable learning/prediction algorithm, assuming the hardness of strongly refuting a certain class of CSPs.
View on arXiv