Methods for efficient LLM pre-training at scale
Event details
| Date | 12.02.2026 |
| Hour | 14:00 › 16:00 |
| Speaker | Alejandro Hernandez Cano |
| Location | |
| Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Antoine Bosselut
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Emmanuel Abbé
Abstract
Adoption and demand for ever-stronger language
foundation models has been steadily increasing over the last
decade. In order to obtain them, it is crucial to start from a
strong base model, and thus pre-training remains an essential
stage of the training pipeline, especially as it uses the majority
of computational resources. Investigating methods for efficient
training at scale is therefore crucial for the field. In this work,
we review three different papers that highlight the importance
of transformer architecture components when one aims for an
efficient training, and propose future work to further push this
line of research.
Selected papers
Exam president: Prof. Antoine Bosselut
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Emmanuel Abbé
Abstract
Adoption and demand for ever-stronger language
foundation models has been steadily increasing over the last
decade. In order to obtain them, it is crucial to start from a
strong base model, and thus pre-training remains an essential
stage of the training pipeline, especially as it uses the majority
of computational resources. Investigating methods for efficient
training at scale is therefore crucial for the field. In this work,
we review three different papers that highlight the importance
of transformer architecture components when one aims for an
efficient training, and propose future work to further push this
line of research.
Selected papers
Practical information
- General public
- Free