Asymptotic behavior of -based Laplacian regularization in semi-supervised learning

Given a weighted graph with vertices, consider a real-valued regression problem in a semi-supervised setting, where one observes labeled vertices, and the task is to label the remaining ones. We present a theoretical study of -based Laplacian regularization under a -dimensional geometric random graph model. We provide a variational characterization of the performance of this regularized learner as grows to infinity while stays constant, the associated optimality conditions lead to a partial differential equation that must be satisfied by the associated function estimate . From this formulation we derive several predictions on the limiting behavior the -dimensional function , including (a) a phase transition in its smoothness at the threshold , and (b) a tradeoff between smoothness and sensitivity to the underlying unlabeled data distribution . Thus, over the range , the function estimate is degenerate and "spiky," whereas for , the function estimate is smooth. We show that the effect of the underlying density vanishes monotonically with , such that in the limit , corresponding to the so-called Absolutely Minimal Lipschitz Extension, the estimate is independent of the distribution . Under the assumption of semi-supervised smoothness, ignoring can lead to poor statistical performance, in particular, we construct a specific example for to demonstrate that has lower risk than due to the former penalty adapting to and the latter ignoring it. We also provide simulations that verify the accuracy of our predictions for finite sample sizes. Together, these properties show that is an optimal choice, yielding a function estimate that is both smooth and non-degenerate, while remaining maximally sensitive to .
View on arXiv