40
8
v1v2v3v4 (latest)

Massively Parallel Algorithms for Small Subgraph Counting

Abstract

Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the de-facto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph G=(V,E)G=(V, E) with nn vertices, mm edges and TT triangles, our first result is an algorithm that outputs a (1+ε)(1+\varepsilon)-approximation to TT, with asymptotically \emph{optimal round and total space complexity} provided any Smax(m,n2/m)S \geq \max{(\sqrt m, n^2/m)} space per machine and assuming T=Ω(m/n)T=\Omega(\sqrt{m/n}). Our result gives a quadratic improvement on the bound on TT over previous works. We also provide a simple extension of our result to counting \emph{any} subgraph of kk size for constant k1k \geq 1. Our second result is an Oε(loglogn)O_{\varepsilon}(\log \log n)-round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the \emph{arboricity} α\alpha of the input graph. We extend this result to exactly counting kk-cliques for any constant kk. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 55 can be implemented in the MPC model in total space.

View on arXiv
Comments on this paper