203

Improving and Scaling Trans-dimensional Random Field Language Models

Abstract

The dominant language models (LMs) such as n-gram and neural network (NN) models represent sentence probabilities in terms of conditionals. In contrast, a new trans-dimensional random field (TRF) LM has been recently introduced to show superior performances, where the whole sentence is modeled as a random field. In this paper, we further develop the TDF LMs with two technical improvements, which are a new method of exploiting Hessian information in parameter optimization to further enhance the convergence of the training algorithm and an enabling method for training the TRF LMs on large corpus which may contain rare very long sentences. Experiments show that the TRF LMs can scale to using training data of up to 32 million words, consistently achieve 10% relative perplexity reductions over 5-gram LMs, and perform as good as NN LMs but with much faster speed in calculating sentence probabilities.

View on arXiv
Comments on this paper