Towards Improving the Pretraining of Large Language Models

Thumbnail

Event details

Date 31.10.2024
Hour 09:0011:00
Speaker Zhengqing Wu
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Martin Jaggi
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Nicolas Fammarion

Abstract
Training large language models requires co-optimizing numerous hyperparameters (model size, learning rate, batch size, etc.) for various goals (training speed, compute efficiency, generalization performance, etc.), resulting in a complicated task. To tackle such difficulty, one needs to (1) understand how each hyperparameter affects the training, (2) co-optimize different hyperparameters, and (3) strike a balance between different goals. In this talk, I will present three papers that discuss how we can achieve these.

Background papers
[1] Scaling Laws for Neural Language Models, https://arxiv.org/abs/2001.08361.
[2] Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer, https://proceedings.neurips.cc/paper/2021/hash/8df7c2e3c3c3be098ef7b382bd2c37ba-Abstract.html
[3] On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, https://openreview.net/forum?id=H1oyRlYgg&;amp;noteId=H1oyRlYgg
 

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share