23
0

The Four Point Permutation Test for Latent Block Structure in Incidence Matrices

Abstract

Transactional data may be represented as a bipartite graph G:=(LR,E)G:=(L \cup R, E), where LL denotes agents, RR denotes objects visible to many agents, and an edge in EE denotes an interaction between an agent and an object. Unsupervised learning seeks to detect block structures in the adjacency matrix ZZ between LL and RR, thus grouping together sets of agents with similar object interactions. New results on quasirandom permutations suggest a non-parametric four point test to measure the amount of block structure in GG, with respect to vertex orderings on LL and RR. Take disjoint 4-edge random samples, order these four edges by left endpoint, and count the relative frequencies of the 4!4! possible orderings of the right endpoint. When these orderings are equiprobable, the edge set EE corresponds to a quasirandom permutation π\pi of E|E| symbols. Total variation distance of the relative frequency vector away from the uniform distribution on 24 permutations measures the amount of block structure. Such a test statistic, based on E/4\lfloor |E|/4 \rfloor samples, is computable in O(E/p)O(|E|/p) time on pp processors. Possibly block structure may be enhanced by precomputing natural orders on LL and RR, related to the second eigenvector of graph Laplacians. In practice this takes O(dE)O(d |E|) time, where dd is the graph diameter. Five open problems are described.

View on arXiv
Comments on this paper