Semi-Supervised Learning on Graphs through Reach and Distance Diffusion
Semi-supervised learning (SSL) algorithms are an indispensable tool when labeled examples are scarce and there are many unlabeled examples [Blum and Chawla 2001, Zhu et. al. 2003]. With graph-based methods, entities (examples) correspond to nodes in a graph and edges correspond to related entities. The graph structure is used to infer a {\em kernel} (pairwise affinity values) which is used to compute the learned labels. The most popular SSL methods are {\em spectral}. "Symmetric" spectral methods scale well using Jacobi iterations. Personalized Page Rank (PPR) applies with directed relations, such as like, follow, or hyperlinks, and does not scale as well. We propose here a novel SSL framework that is derived from powerful social and economic models of centrality and influence in networks [Kempe, Kleinberg, and Tardos 2003] and in that space, complement spectral centrality models such as Page Rank. Our {\em Reach diffusion} and {\em Distance diffusion} kernels capture the pairwise relations that underline influence. We develop highly scalable algorithms for parameter setting and label learning with our kernels. Our framework offers high scalability, handling of directed relations, and the promise of an alternative approach that can be more suitable for some applications.
View on arXiv