426

Scalability and robustness of spectral embedding: landmark diffusion is all you need

Information and Inference A Journal of the IMA (JIII), 2020
Abstract

While spectral embedding is a widely applied dimension reduction technique in various fields, so far it is still challenging to make it scalable and robust to handle "big data". Motivated by the need of handling such data, we propose a novel spectral embedding algorithm, which we coined Robust and Scalable Embedding via Landmark Diffusion (ROSELAND). In short, we measure the affinity between two points via a set of landmarks, which is composed of a small number of points, and "diffuse" on the dataset via the landmark set to achieve a spectral embedding. The embedding is not only scalable and robust, but also preserves the geometric properties under the manifold setup. The Roseland can be viewed as a generalization of the commonly applied spectral embedding algorithm, the diffusion map (DM), in the sense that it shares various properties of the DM. In addition to providing a theoretical justification of the Roseland under the manifold setup, including handling the U-statistics-like quantities, providing a LL^\infty spectral convergence with a rate, and offering a high dimensional noise analysis, we show various numerical simulations and compare the Roseland with other existing algorithms.

View on arXiv
Comments on this paper