Database Systems Optimizations for Machine Learning Operations

Event details
Date | 16.06.2025 |
Hour | 14:30 › 16:30 |
Speaker | Mathis Randl |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Anastasia Ailamaki
Thesis advisor: Prof. Anne-Marie Kermarrec
Co-examiner: Prof. Sanidhya Kashyap
Abstract
Recent machine learning systems increasingly rely on retrieving information from large external corpora at inference time. This report examines three key papers that define and refine the design of such retrieval-based architectures across different system levels. The first, Retrieval-Augmented Generation (RAG), introduces a hybrid model that combines dense retrieval with neural generation to improve factual accuracy and interpretability in knowledge-intensive NLP tasks. The second, FAISS, addresses the scalability challenges of dense vector search by proposing an efficient approximate nearest neighbor (ANN) framework optimized for GPU execution. The third, JUNO, further accelerates ANN search by leveraging sparsity in product quantization and mapping filtering operations to ray tracing cores. Together, these papers trace a progression from algorithmic design to systems optimization and finally to hardware-aware acceleration. The report concludes with a discussion of open challenges and outlines a research direction building on these foundations.
Selected papers
1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al, https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
2. Billion-Scale Similarity Search with GPUs, Johnson et al, https://ieeexplore.ieee.org/document/8733051
3. JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping, Liu et al, https://dl.acm.org/doi/10.1145/3620665.3640360
Exam president: Prof. Anastasia Ailamaki
Thesis advisor: Prof. Anne-Marie Kermarrec
Co-examiner: Prof. Sanidhya Kashyap
Abstract
Recent machine learning systems increasingly rely on retrieving information from large external corpora at inference time. This report examines three key papers that define and refine the design of such retrieval-based architectures across different system levels. The first, Retrieval-Augmented Generation (RAG), introduces a hybrid model that combines dense retrieval with neural generation to improve factual accuracy and interpretability in knowledge-intensive NLP tasks. The second, FAISS, addresses the scalability challenges of dense vector search by proposing an efficient approximate nearest neighbor (ANN) framework optimized for GPU execution. The third, JUNO, further accelerates ANN search by leveraging sparsity in product quantization and mapping filtering operations to ray tracing cores. Together, these papers trace a progression from algorithmic design to systems optimization and finally to hardware-aware acceleration. The report concludes with a discussion of open challenges and outlines a research direction building on these foundations.
Selected papers
1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al, https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
2. Billion-Scale Similarity Search with GPUs, Johnson et al, https://ieeexplore.ieee.org/document/8733051
3. JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping, Liu et al, https://dl.acm.org/doi/10.1145/3620665.3640360
Practical information
- General public
- Free