Compile-Time Code Generation of Embedded Data-Intensive Query Languages

Event details
Date | 06.10.2017 |
Hour | 11:00 › 12:00 |
Speaker | Prof. Leonidas Fegaras |
Location | |
Category | Conferences - Seminars |
Many emerging Big-Data programming environments, such as Spark and Flink, provide powerful APIs, inspired by functional programming, that consist of a small number of higher-order operations. However, because of the complexity involved in developing and fine-tuning data analysis applications using the provided APIs, many programmers prefer to use declarative languages, such as Hive and Spark SQL, to code their distributed applications. Unfortunately, current data analysis query languages, which are typically based on the relational model, cannot effectively capture the rich data types and computations required for complex data analysis applications. Furthermore, these query languages are not well-integrated with the host programming language, as they are based on an incompatible data model, and are checked for correctness at run-time, which results in a significantly longer program development time. In this talk, I will introduce a new query language for data-intensive scalable computing, called DIQL, that is deeply embedded in Scala, and a query optimization framework that optimizes and translates DIQL queries to byte code at compile-time. DIQL supports nested collections and hierarchical data and allows query nesting at any place in a query. With DIQL, programmers can express complex data analysis tasks, such as PageRank and matrix factorization, using SQL-like syntax exclusively. I will also present an algebra for data-intensive scalable computing based on monoid homomorphisms that consists of a small set of operations that capture most features supported by current domain-specific languages for data-centric distributed computing. The DIQL query optimizer, which is based on the monoid algebra, can find any possible join in a query, including joins hidden across deeply nested queries, thus unnesting any form of query nesting.
Practical information
- General public
- Free
- This event is internal
Organizer
- Prof. Anastasia Ailamaki
Contact
- Dimitra Tsaoussis-Melissargos