270

AnchorGAE: General Data Clustering via O(n)O(n) Bipartite Graph Convolution

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Abstract

Graph-based clustering plays an important role in clustering tasks. As graph convolution network (GCN), a variant of neural networks on graph-type data, has achieved impressive performance, it is attractive to find whether GCNs can be used to augment the graph-based clustering methods on non-graph data, i.e., general data. However, given nn samples, the graph-based clustering methods usually need at least O(n2)O(n^2) time to build graphs and the graph convolution requires nearly O(n2)O(n^2) for a dense graph and O(E)O(|\mathcal{E}|) for a sparse one with E|\mathcal{E}| edges. In other words, both graph-based clustering and GCNs suffer from severe inefficiency problems. To tackle this problem and further employ GCN to promote the capacity of graph-based clustering, we propose a novel clustering method, AnchorGAE. As the graph structure is not provided in general clustering scenarios, we first show how to convert a non-graph dataset into a graph by introducing the generative graph model, which is used to build GCNs. Anchors are generated from the original data to construct a bipartite graph such that the computational complexity of graph convolution is reduced from O(n2)O(n^2) and O(E)O(|\mathcal{E}|) to O(n)O(n). The succeeding steps for clustering can be easily designed as O(n)O(n) operations. Interestingly, the anchors naturally lead to a siamese GCN architecture. The bipartite graph constructed by anchors is updated dynamically to exploit the high-level information behind data. Eventually, we theoretically prove that the simple update will lead to degeneration and a specific strategy is accordingly designed.

View on arXiv
Comments on this paper