We present the latent quality model (LQM) for joint modeling of topics and citations in document networks. The LQM combines the strengths of the latent Dirichlet allocation (LDA) and the mixed membership stochastic blockmodel (MMB), and associates each document with a latent quality score. This score provides a topic-free measure of the impact of a document, which is different from the raw count of citations. We develop an efficient algorithm for fitting the LQM using variational methods. To scale up to large networks, we develop an online variant using stochastic gradient methods and case-control likelihood approximation. We evaluate the performance of the LQM using the benchmark KDD Cup 2003 dataset with approximately 30,000 high energy physics papers and demonstrate that LQM can improve citation prediction significantly.
View on arXiv