Order preserving hierarchical agglomerative clustering

We present a method for hierarchical clustering of directed acyclic graphs and other strictly partially ordered data that preserves the data structure. In particular, if we have in the original data and denote their respective clusters by and , we get in the produced clustering. The clustering uses standard linkage functions, such as single- and complete linkage, and is a generalisation of hierarchical clustering of non-ordered sets. To achieve this, we define the output from running hierarchical clustering algorithms on strictly ordered data to be partial dendrograms; sub-trees of classical dendrograms with several connected components. We then construct an embedding of partial dendrograms over a set into the family of ultrametrics over the same set. An optimal hierarchical clustering is now defined as follows: Given a collection of partial dendrograms, the optimal clustering is the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure, measured in the -norm. Thus, the method is a combination of classical hierarchical clustering and ultrametric fitting.
View on arXiv