fMBN-E: Efficient Unsupervised Network Structure Ensemble and Selection for Clustering

5 July 2021

Abstract

It is known that unsupervised nonlinear dimensionality reduction and clustering is sensitive to the selection of hyperparameters, particularly for deep learning based methods, which hinder its practical use. How to select a proper network structure that may be dramatically different in different applications is a hard issue for deep models, given little prior knowledge of data. In this paper, we explore ensemble learning and selection techniques for automatically determining the optimal network structure of a deep model, named multilayer bootstrap networks (MBN). Specifically, we first propose an MBN ensemble (MBN-E) algorithm which concatenates the sparse outputs of a set of MBN base models with different network structures into a new representation. Because training an ensemble of MBN is expensive, we propose a fast version of MBN-E (fMBN-E), which replaces the step of random data resampling in MBN-E by the resampling of random similarity scores. Theoretically, fMBN-E is even faster than a single standard MBN. Then, we take the new representation produced by MBN-E as a reference for selecting the optimal MBN base models. Two kinds of ensemble selection criteria, named optimization-like selection criteria and distribution divergence criteria, are applied. Importantly, MBN-E and its ensemble selection techniques maintain the simple formulation of MBN that is based on one-nearest-neighbor learning, and reach the state-of-the-art performance without manual hyperparameter tuning. fMBN-E is empirically even hundreds of times faster than MBN-E without suffering performance degradation. The source code is available at http://www.xiaolei-zhang.net/mbn-e.htm.

View on arXiv

Comments on this paper