56
6

How random are a learner's mistakes?

Abstract

Given a random binary sequence X(n)X^{(n)} of random variables, Xt,X_{t}, t=1,2,...,nt=1,2,...,n, for instance, one that is generated by a Markov source (teacher) of order kk^{*} (each state represented by kk^{*} bits). Assume that the probability of the event Xt=1X_{t}=1 is constant and denote it by β\beta. Consider a learner which is based on a parametric model, for instance a Markov model of order kk, who trains on a sequence x(m)x^{(m)} which is randomly drawn by the teacher. Test the learner's performance by giving it a sequence x(n)x^{(n)} (generated by the teacher) and check its predictions on every bit of x(n).x^{(n)}. An error occurs at time tt if the learner's prediction YtY_{t} differs from the true bit value XtX_{t}. Denote by ξ(n)\xi^{(n)} the sequence of errors where the error bit ξt\xi_{t} at time tt equals 1 or 0 according to whether the event of an error occurs or not, respectively. Consider the subsequence ξ(ν)\xi^{(\nu)} of ξ(n)\xi^{(n)} which corresponds to the errors of predicting a 0, i.e., ξ(ν)\xi^{(\nu)} consists of the bits of ξ(n)\xi^{(n)} only at times tt such that Yt=0.Y_{t}=0. In this paper we compute an estimate on the deviation of the frequency of 1s of ξ(ν)\xi^{(\nu)} from β\beta. The result shows that the level of randomness of ξ(ν)\xi^{(\nu)} decreases relative to an increase in the complexity of the learner.

View on arXiv
Comments on this paper