Declarative query processing in imperative managed runtimes
The falling price of main memory has led to the development and growth of in-memory databases. At the same time, new advances in memory technology, like persistent memory, make it possible to have a truly universal storage model, accessed directly through the programming language in the context of a fully managed runtime. This environment is further enhanced by language-integrated query, which has picked up significant traction and has emerged as a generic, safe method of combining programming languages with databases with considerable software engineering benefits. Our perspective on language-integrated query is that it combines the runtime of a programming language with that of a database system. This leads to the question of how to tightly integrate these two runtimes. Our proposal is to apply just-in-time code generation and compilation techniques that have recently been developed for general query processing. The idea is that instead of compiling queries to query plans, which are then interpreted, the system generates customized native code that is then compiled and executed by the query engine. At the same time, we must enable the runtime to take advantage of advances in main memory technology and, primarily, persistent memory. Persistent memory is byte-addressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into data processing in managed runtimes. We do so in the context of fundamental query processing operations and introduce the notion of write-limited algorithms that effectively minimize the I/O cost. We give a high-level API that enables the system to dynamically optimize the workflow of the algorithms; or, alternatively, allows the developer to tune the write profile of the algorithms. This dynamic adaptation fits in well with the notion of just-in-time compilation. We present the results of our work in integrating database and programming language runtimes through code generation and extensive just-in-time adaptation. Our techniques deliver significant performance improvements over non-integrated solutions. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing higher-level APIs and language-integrated query to provide transparent and highly efficient querying.