Fast Masked Diffusion Models For Large-Scale Reasoning

Event details

Date	18.06.2025
Hour	09:00 › 11:00
Speaker	Justin Deschenaux
Location	BC 410
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Volkan Cevher
Thesis advisor: Prof. Caglar Gulcehre
Co-examiner: Prof. Nicolas Flammarion

Abstract
Autoregressive (AR) large language models (LLMs) currently dominate generative sequence modeling, demonstrating remarkable success across natural language processing tasks, including complex domains like mathematics and coding.

On the other hand, the AR decomposition imposes certain constraints, for example on the neural network architecture, requiring causal masking in transformer decoders. This necessity can introduce artifacts, such as the "reverse curse" in reasoning tasks. Moreover, sequential token generation from LLMs is notably slow. While techniques like speculative decoding can alleviate this issue, they add complexity and are somewhat ad-hoc within the causal generative modeling paradigm.

Recently, alternative sequence modeling paradigms, particularly masked diffusion models (MDMs), have demonstrated performance that rivals autoregressive (AR) models. Hence, these emerging approaches have the potential to shape the future of sequence generative modeling. Notably, MDMs inherently provide a flexible balance between generation speed and quality. Given that the enhanced performance of MDMs is a recent advancement, their capabilities and scalability are not as thoroughly explored as those of AR models.

This thesis investigates the trade-offs of non-AR sequence models, specifically focusing on masked diffusion models. Our primary objective is to enhance the discrete diffusion framework to challenge the dominance of AR models in reasoning tasks. We focus on improving the decoding latency, optimizing neural network architectures, and developing novel decoding algorithms, all with an emphasis on reasoning capabilities.

Selected papers
- Structured Denoising Diffusion Models in Discrete State-Spaces (D3PM), https://arxiv.org/abs/2107.03006
- Simple and Effective Masked Diffusion Language Models (MDLM), https://arxiv.org/abs/2406.07524
- Large Language Diffusion Models (LLaDA), https://arxiv.org/abs/2502.09992

Practical information

General public
Free

Contact

edic@epfl.ch

Export Event

Event broadcasted in