155

GG-Mapper: Learning a Cover in the Mapper Construction

SIAM Journal on Mathematics of Data Science (SIMODS), 2023
Main:19 Pages
17 Figures
Bibliography:3 Pages
2 Tables
Abstract

The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on GG-means clustering which searches for the optimal number of clusters in kk-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly fast.

View on arXiv
Comments on this paper