Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases

23 October 2013

Abstract

Queries over probabilistic databases return tuples together with scores for ranking them. Determining these scores is #P-hard, in general. For some queries called safe queries, they can be evaluated in PTIME inside a standard relational database with what are called safe query plans, while for a general unsafe query, they need to be evaluated with a general-purpose inference engine, obviously at high cost. We propose a new approach, called dissociation, by which these scores are evaluated inside the database engine for every query. We achieve this by changing from the commonly-used possible-world semantics to a new semantics that we call propagation. Conceptually, a dissociated query is obtained from an unsafe query by adding extraneous variables to some atoms until the query becomes safe. We show that the scores for chain queries before and after dissociation correspond to two well-known scoring functions on graphs, namely network reliability (which is #P-hard), and propagation (which is related to PageRank and in PTIME). We then define a unique propagation score for an answer to a self-join-free conjunctive query and prove that it is always an upper bound to the score given by the possible world semantics.We further show that the propagation score can always be evaluated with an instance-independent query plan, that the propagation score is identical to the possible world semantics for all safe queries, and that the concept of propagation is a natural generalization of plans for safe queries to plans for unsafe queries. We give several optimizations and provide experimental evidence for quality and effectiveness of our approach.

View on arXiv

Comments on this paper