Sign Stable Random Projections for Large-Scale Learning

27 April 2015

Abstract

We study the use of "sign $\alpha$ -stable random projections" (where $0<\alpha\leq 2$ ) for building basic data processing tools in the context of large-scale machine learning applications (e.g., classification, regression, clustering, and near-neighbor search). After the processing by sign stable random projections, the inner products of the processed data approximate various types of nonlinear kernels depending on the value of $\alpha$ . Thus, this approach provides an effective strategy for approximating nonlinear learning algorithms essentially at the cost of linear learning. When $\alpha =2$ , it is known that the corresponding nonlinear kernel is the arc-cosine kernel. When $\alpha=1$ , the procedure approximates the arc-cos- $\chi^2$ kernel (under certain condition). When $\alpha\rightarrow0+$ , it corresponds to the resemblance kernel. From practitioners' perspective, the method of sign $\alpha$ -stable random projections is ready to be tested for large-scale learning applications, where $\alpha$ can be simply viewed as a tuning parameter. What is missing in the literature is an extensive empirical study to show the effectiveness of sign stable random projections, especially for $\alpha\neq 2$ or 1. The paper supplies such a study on a wide variety of classification datasets. In particular, we compare shoulder-by-shoulder sign stable random projections with the recently proposed "0-bit consistent weighted sampling (CWS)" (Li 2015).

View on arXiv

Comments on this paper