IC Colloquium: Learning Models over Relational Databases
Event details
Date | 07.10.2019 |
Hour | 16:15 › 17:30 |
Location | |
Category | Conferences - Seminars |
By: Dan Olteanu - University of Oxford
Video of his talk
Abstract:
In this talk, I will make the case for a first-principles approach to machine learning over relational databases that exploits recent development in database systems and theory. The input to learning classification and regression models is defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using statistical software packages. These three steps are expensive and unnecessary. Instead, one can cast the machine learning problem as a database problem by decomposing the learning task into a batch of aggregates over the feature extraction query and by computing this batch over the input database. Ongoing results show that the performance of this approach benefits tremendously from structural properties of the relational data and of the feature extraction query; such properties may be algebraic (semi-ring), combinatorial (hypertree width), or statistical (sampling). It also benefits from factorized query evaluation and query compilation. For a variety of models, including factorization machines, decision trees, and support vector machines, this approach may come with lower computational complexity than the materialization of the training dataset used by the mainstream approach. This translates to several orders-of-magnitude speed-up over state-of-the-art systems such as TensorFlow, R, Scikit-learn, and mlpack.
This work is part of the FDB project (https://fdbresearch.github.io) and based on collaboration with Maximilian Schleich (Oxford), Jakub Zavodny (Oxford), Milos Nikolic (Edinburgh), Mahmoud Abo-Khamis, Ryan Curtin, Hung Q. Ngo (RelationalAI), Ben Moseley (CMU), and XuanLong Nguyen (Michigan).
Bio:
Dan Olteanu is Professor of Computer Science at the University of Oxford and Computer Scientist at RelationalAI. He received his PhD from the University of Munich in 2005. He spends his time understanding hard computational challenges around data processing and designing simple and scalable solutions towards these challenges. He has published over 70 papers in the areas of database systems, AI, and theoretical computer science, contributing to XML query processing, incomplete information and probabilistic databases, factorised databases, scalable and incremental in-database optimisation, and the commercial systems LogicBlox and RelationalAI. He co-authored the book « Probabilistic Databases » (2011). He has served as associate editor for PVLDB and IEEE TKDE, as track chair for IEEE ICDE’15, group leader for ACM SIGMOD’15, vice chair for ACM SIGMOD’17, and co-chair for AMW’18, and he is currently serving as associate editor for ACM TODS and the SIGMOD Record Database Principles column. He is the recipient of an ERC Consolidator grant (2016), an Oxford Outstanding Teaching award (2009), and the ICDT 2019 best paper award.
More information
Video of his talk
Abstract:
In this talk, I will make the case for a first-principles approach to machine learning over relational databases that exploits recent development in database systems and theory. The input to learning classification and regression models is defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using statistical software packages. These three steps are expensive and unnecessary. Instead, one can cast the machine learning problem as a database problem by decomposing the learning task into a batch of aggregates over the feature extraction query and by computing this batch over the input database. Ongoing results show that the performance of this approach benefits tremendously from structural properties of the relational data and of the feature extraction query; such properties may be algebraic (semi-ring), combinatorial (hypertree width), or statistical (sampling). It also benefits from factorized query evaluation and query compilation. For a variety of models, including factorization machines, decision trees, and support vector machines, this approach may come with lower computational complexity than the materialization of the training dataset used by the mainstream approach. This translates to several orders-of-magnitude speed-up over state-of-the-art systems such as TensorFlow, R, Scikit-learn, and mlpack.
This work is part of the FDB project (https://fdbresearch.github.io) and based on collaboration with Maximilian Schleich (Oxford), Jakub Zavodny (Oxford), Milos Nikolic (Edinburgh), Mahmoud Abo-Khamis, Ryan Curtin, Hung Q. Ngo (RelationalAI), Ben Moseley (CMU), and XuanLong Nguyen (Michigan).
Bio:
Dan Olteanu is Professor of Computer Science at the University of Oxford and Computer Scientist at RelationalAI. He received his PhD from the University of Munich in 2005. He spends his time understanding hard computational challenges around data processing and designing simple and scalable solutions towards these challenges. He has published over 70 papers in the areas of database systems, AI, and theoretical computer science, contributing to XML query processing, incomplete information and probabilistic databases, factorised databases, scalable and incremental in-database optimisation, and the commercial systems LogicBlox and RelationalAI. He co-authored the book « Probabilistic Databases » (2011). He has served as associate editor for PVLDB and IEEE TKDE, as track chair for IEEE ICDE’15, group leader for ACM SIGMOD’15, vice chair for ACM SIGMOD’17, and co-chair for AMW’18, and he is currently serving as associate editor for ACM TODS and the SIGMOD Record Database Principles column. He is the recipient of an ERC Consolidator grant (2016), an Oxford Outstanding Teaching award (2009), and the ICDT 2019 best paper award.
More information
Practical information
- General public
- Free
- This event is internal
Contact
- Host: Christoph Koch