Towards Improving the Pretraining of Large Language Models

Event details

Date	31.10.2024
Hour	09:00 › 11:00
Speaker	Zhengqing Wu
Location	BC 010
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Martin Jaggi
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Nicolas Fammarion

Abstract
Training large language models requires co-optimizing numerous hyperparameters (model size, learning rate, batch size, etc.) for various goals (training speed, compute efficiency, generalization performance, etc.), resulting in a complicated task. To tackle such difficulty, one needs to (1) understand how each hyperparameter affects the training, (2) co-optimize different hyperparameters, and (3) strike a balance between different goals. In this talk, I will present three papers that discuss how we can achieve these.

Background papers
[1] Scaling Laws for Neural Language Models, https://arxiv.org/abs/2001.08361.
[2] Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer, https://proceedings.neurips.cc/paper/2021/hash/8df7c2e3c3c3be098ef7b382bd2c37ba-Abstract.html
[3] On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, https://openreview.net/forum?id=H1oyRlYgg&;amp;noteId=H1oyRlYgg

Practical information

General public
Free

Contact

edic@epfl.ch

Export Event

Event broadcasted in