This paper presents DwarvesGraph, the first compilation-based graph pattern mining (GPM) system based on pattern decomposition algorithms, which decompose a pattern into several subpatterns and find the count of each. Such algorithms can be orders of magnitudes faster than algorithms used in current GPM systems because the execution time of pattern enumeration drastically increases with pattern size. We define a novel partial-embedding-centric programming model that supports various applications. We propose an efficient on-the-fly aggregation of subpatterns embeddings to reduce memory consumption and random accesses. DwarvesGraph compiler, using abstract syntax tree (AST) as intermediate representation (IR), can apply conventional and a novel pattern-aware loop rewriting optimization to eliminate redundant computation that cannot be removed with standard methods. To estimate implementation cost based on AST, we propose a simple locality-aware and an advanced approximate-mining-based cost model to accurately capture the characteristics of real-world graphs. DwarvesGraph fully automates the algorithm generation, optimization, and selection in the search space. As a general GPM system, DwarvesGraph achieves performance much closer to the best native pattern decomposition algorithms.
View on arXiv