21
2

Block Model Guided Unsupervised Feature Selection

Abstract

Feature selection is a core area of data mining with a recent innovation of graph-driven unsupervised feature selection for linked data. In this setting we have a dataset Y\mathbf{Y} consisting of nn instances each with mm features and a corresponding nn node graph (whose adjacency matrix is A\mathbf{A}) with an edge indicating that the two instances are similar. Existing efforts for unsupervised feature selection on attributed networks have explored either directly regenerating the links by solving for ff such that f(yi,yj)Ai,jf(\mathbf{y}_i,\mathbf{y}_j) \approx \mathbf{A}_{i,j} or finding community structure in A\mathbf{A} and using the features in Y\mathbf{Y} to predict these communities. However, graph-driven unsupervised feature selection remains an understudied area with respect to exploring more complex guidance. Here we take the novel approach of first building a block model on the graph and then using the block model for feature selection. That is, we discover FMFTA\mathbf{F}\mathbf{M}\mathbf{F}^T \approx \mathbf{A} and then find a subset of features S\mathcal{S} that induces another graph to preserve both F\mathbf{F} and M\mathbf{M}. We call our approach Block Model Guided Unsupervised Feature Selection (BMGUFS). Experimental results show that our method outperforms the state of the art on several real-world public datasets in finding high-quality features for clustering.

View on arXiv
Comments on this paper