Active Sampling for Graph-Cognizant Classification via Expected Model Change

19 May 2017

Abstract

The present work considers active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed from data using pairwise feature similarity measures. Leveraging the graph builds on the premise that labels vary smoothly over neighboring nodes and are modeled by a Markov random field. The model is further relaxed to a Gaussian field with labels allowed to have continuous values, an approximation that mitigates the combinatorial complexity of the binary model. The proposed sampling strategy is based on querying the node that is expected to inflict the largest change on the model. Such a strategy subsumes several measures of expected model change, thus unifying and establishing links with existing methods such as uncertainty sampling, variance minimization and sampling based on the $\Sigma-$ optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities, and reducing the bias of the resultant estimator, by taking into account the confidence on the model label predictions. More importantly, the novel sampling strategies are based on quantities that are readily available without the need for model retraining, making them scalable to large graphs. Numerical tests using synthetic and real data indicate that the proposed methods achieve accuracy that is comparable or superior to the state-of-the-art even at reduced runtime.

View on arXiv

Comments on this paper