BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:Methods for efficient LLM pre-training at scale
DTSTART:20260212T140000
DTEND:20260212T160000
DTSTAMP:20260416T071229Z
UID:244836d8ef5efb8bb91409d680fbfa43ef639d1bdda9da3fb9a70cbc
CATEGORIES:Conferences - Seminars
DESCRIPTION:Alejandro Hernandez Cano\nEDIC candidacy exam\nExam president:
  Prof. Antoine Bosselut\nThesis advisor: Prof. Martin Jaggi\nCo-examiner: 
 Prof. Emmanuel Abbé\n\nAbstract\nAdoption and demand for ever-stronger la
 nguage\nfoundation models has been steadily increasing over the last\ndeca
 de. In order to obtain them\, it is crucial to start from a\nstrong base m
 odel\, and thus pre-training remains an essential\nstage of the training p
 ipeline\, especially as it uses the majority\nof computational resources. 
 Investigating methods for efficient\ntraining at scale is therefore crucia
 l for the field. In this work\,\nwe review three different papers that hig
 hlight the importance\nof transformer architecture components when one aim
 s for an\nefficient training\, and propose future work to further push thi
 s\nline of research.\n\nSelected papers\n\n	Understanding and Minimising O
 utlier Features in Neural Network Training.\n	Scaling FP8 training to tril
 lion-token LLMs.\n	Scaling up Test-Time Compute with Latent Reasoning: A R
 ecurrent Depth Approach.\n
LOCATION:INJ 326 https://plan.epfl.ch/?room==INJ%20326
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
