Rank-based linkage I: triplet comparisons and oriented simplicial complexes

Rank-based linkage is a new tool for summarizing a collection of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on . Rank-based linkage is applied to the -nearest neighbor digraph derived from a ranking system. Computations occur on a 2-dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected -nearest neighbor graph on . In steps it builds an edge-weighted linkage graph where is called the in-sway between objects and . Take to be the links whose in-sway is at least , and partition into components of the graph , for varying . Rank-based linkage is a functor from a category of out-ordered digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rank-respectful way gives a fresh clustering which does not ``rip apart`` the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimization-based methods. Open combinatorial problems are presented in the last section.
View on arXiv