Query Workload-based RDF Graph Fragmentation and Allocation

As the volume of the RDF data becomes increasingly large, it is essential for us to design a distributed database system to manage it. For distributed RDF data design, it is quite common to partition the RDF data into some parts, called fragments, which are then distributed. Thus, the distribution design consists of two steps: fragmentation and allocation. In this paper, we propose a method to explore the intrinsic similarities among the structures of RDF query workload for fragmentation and allocation, which aims to reduce the number of crossing matches and the communication cost of SPARQL query evaluation. Specifically, we mine and select some frequent access patterns to reflect the characteristics of the workload. Then, based on these frequent access patterns, we propose two fragmentation strategies to divide RDF graphs while meeting different kinds of query processing objectives. After fragmentation, we discuss how to allocate these fragments to various sites. Finally, we discuss how to evaluate a query according to the results of fragmentation and allocation. Extensive experiments confirm the superior performance of our proposed solutions.
View on arXiv