Distributed graphs: in search of fast, low-latency, resource-efficient, semantics-rich Big-Data processing

26 November 2019

Abstract

Large graphs can be processed with single high-memory or distributed systems, focusing on querying the graph or executing algorithms using high-level APIs. For systems focused on processing graphs, common use-cases consist in executing algorithms such as PageRank or community detection on top of distributed systems that read from storage (local or distributed), compute and output results to storage in a way akin to a read-eval-write loop. Graph analysis tasks face new hurdles with the additional dimension of evolving data. The systems we detail herein have considered the evolution of data in the form of stream processing. With it, semantics are offered to allow results' aggregation in windows which can be based on element counts or different time definitions. However, this semantic has yet to be incorporated in the expressiveness of graph processing itself. We firstly detail the existing types of current graph analysis tasks; secondly, highlight state-of-the-art solutions for different aspects of these tasks. The resulting analysis identifies the need for systems to be able to effectively extend the aforementioned type of read-eval loop execution, by maintaining a graph (or parts of) in a cluster's memory for reuse, skipping the recurring I/O overhead which is present in all systems.

View on arXiv

Comments on this paper