Towards Generative Models for Discrete Data

Event details
Date | 08.10.2025 |
Hour | 15:30 › 17:30 |
Speaker | Alba Carballo Castro |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Maria Brbic
Thesis advisor: Prof. Pascal Frossard
Co-examiner: Prof. Florent Krzakala
Abstract
Generative modeling has achieved remarkable success in continuous domains, particularly through diffusion and flow-based methods. However, many domains of scientific and practical relevance are inherently discrete, including language, graphs, or biological data. While autoregressive architectures have dominated discrete generative modeling, recent adaptations of diffusion and flow matching have extended these powerful paradigms from the continuous to the discrete setting. These approaches offer new perspectives on controllability, sample quality, and modeling of intricate dependencies that characterize discrete data. This write-up surveys the foundations of discrete diffusion and discrete flow matching, highlighting applications that span from language generation to molecular discovery. By outlining their principles and challenges, it aims to clarify how these methods pave the way toward more expressive and principled generative models for structured data.
Background papers
- Simple and Effective Masked Diffusion Language Models https://arxiv.org/pdf/2406.07524
- Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design https://arxiv.org/pdf/2402.04997
- DeFoG: Discrete Flow Matching for Graph Generation https://arxiv.org/pdf/2410.04263
Exam president: Prof. Maria Brbic
Thesis advisor: Prof. Pascal Frossard
Co-examiner: Prof. Florent Krzakala
Abstract
Generative modeling has achieved remarkable success in continuous domains, particularly through diffusion and flow-based methods. However, many domains of scientific and practical relevance are inherently discrete, including language, graphs, or biological data. While autoregressive architectures have dominated discrete generative modeling, recent adaptations of diffusion and flow matching have extended these powerful paradigms from the continuous to the discrete setting. These approaches offer new perspectives on controllability, sample quality, and modeling of intricate dependencies that characterize discrete data. This write-up surveys the foundations of discrete diffusion and discrete flow matching, highlighting applications that span from language generation to molecular discovery. By outlining their principles and challenges, it aims to clarify how these methods pave the way toward more expressive and principled generative models for structured data.
Background papers
- Simple and Effective Masked Diffusion Language Models https://arxiv.org/pdf/2406.07524
- Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design https://arxiv.org/pdf/2402.04997
- DeFoG: Discrete Flow Matching for Graph Generation https://arxiv.org/pdf/2410.04263
Practical information
- General public
- Free