Convergence of Contrastive Divergence with Annealed Learning Rate in
Exponential Family
In our recent paper, we showed that in exponential family, contrastive divergence (CD) with fixed learning rate will give asymptotically consistent estimates \cite{wu2016convergence}. In this paper, we establish consistency and convergence rate of CD with annealed learning rate . Specifically, suppose CD- generates the sequence of parameters using an i.i.d. data sample of size , then converges in probability to 0 at a rate of . The number () of MCMC transitions in CD only affects the coefficient factor of convergence rate. Our proof is not a simple extension of the one in \cite{wu2016convergence}. which depends critically on the fact that is a homogeneous Markov chain conditional on the observed sample . Under annealed learning rate, the homogeneous Markov property is not available and we have to develop an alternative approach based on super-martingales. Experiment results of CD on a fully-visible Boltzmann Machine are provided to demonstrate our theoretical results.
View on arXiv