BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Fast Masked Diffusion Models For Large-Scale Reasoning
DTSTART:20250618T090000
DTEND:20250618T110000
DTSTAMP:20260406T142517Z
UID:4f6c2d14dda7698bd1d3f2762368aab84b665f6d3448ff73edc2ee57
CATEGORIES:Conferences - Seminars
DESCRIPTION:Justin Deschenaux\nEDIC candidacy exam\nExam president: Prof. 
 Volkan Cevher\nThesis advisor: Prof. Caglar Gulcehre\nCo-examiner: Prof. N
 icolas Flammarion\n\nAbstract\nAutoregressive (AR) large language models (
 LLMs) currently dominate generative sequence modeling\, demonstrating rema
 rkable success across natural language processing tasks\, including comple
 x domains like mathematics and coding.\n\nOn the other hand\, the AR decom
 position imposes certain constraints\, for example on the neural network a
 rchitecture\, requiring causal masking in transformer decoders. This neces
 sity can introduce artifacts\, such as the "reverse curse" in reasoning ta
 sks. Moreover\, sequential token generation from LLMs is notably slow. Whi
 le techniques like speculative decoding can alleviate this issue\, they ad
 d complexity and are somewhat ad-hoc within the causal generative modeling
  paradigm.\n\nRecently\, alternative sequence modeling paradigms\, particu
 larly masked diffusion models (MDMs)\, have demonstrated performance that 
 rivals autoregressive (AR) models. Hence\, these emerging approaches have 
 the potential to shape the future of sequence generative modeling. Notably
 \, MDMs inherently provide a flexible balance between generation speed and
  quality. Given that the enhanced performance of MDMs is a recent advancem
 ent\, their capabilities and scalability are not as thoroughly explored as
  those of AR models.\n\nThis thesis investigates the trade-offs of non-AR 
 sequence models\, specifically focusing on masked diffusion models. Our pr
 imary objective is to enhance the discrete diffusion framework to challeng
 e the dominance of AR models in reasoning tasks. We focus on improving the
  decoding latency\, optimizing neural network architectures\, and developi
 ng novel decoding algorithms\, all with an emphasis on reasoning capabilit
 ies.\n\nSelected papers\n- Structured Denoising Diffusion Models in Discre
 te State-Spaces (D3PM)\, https://arxiv.org/abs/2107.03006\n- Simple and E
 ffective Masked Diffusion Language Models (MDLM)\, https://arxiv.org/abs/
 2406.07524\n- Large Language Diffusion Models (LLaDA)\, https://arxiv.org
 /abs/2502.09992\n 
LOCATION:BC 410 https://plan.epfl.ch/?room==BC%20410
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
