Scalable Query Processing in Probabilistic Databases with SPROUT

Event details
Date | 25.06.2009 |
Hour | 10:15 |
Speaker | Prof. Dan Olteanu, Oxford University, UK |
Location | |
Category | Conferences - Seminars |
In this talk I will address the problem of query evaluation on probabilistic databases and present the SPROUT query engine, which is under development at Oxford. SPROUT is publicly available as an extension of the PostgreSQL 8.3.3 query engine. It is specifically tailored to tractable conjunctive queries with inequalities and to queries that are not tractable in general but become tractable on probabilistic databases restricted by functional dependencies.
The major components of SPROUT are an aggregation operator for exact confidence computation, which can be naturally integrated into existing relational query plans, and optimizations that allow to push the aggregation operator or parts thereof past joins. The operator is based on a fundamental connection between tractable queries and linear-size Ordered Binary Decision Diagrams (OBDDs) representing the uncertainty in the answers to such queries.
I will then discuss the secondary-storage algorithm for the aggregation operator. This algorithm can compute the probability of OBDDs for tractable queries without materializing them, with main memory requirements only dependent on the query size, and in a few scans over the data. Experiments with GBs of TPC-H data show orders of magnitude improvements of SPROUT over state-of-the-art exact and approximate techniques.
Prof. Olteanu's homepage
Practical information
- General public
- Free