FLAIR seminar: The Mysterious Optimization Dynamics of Deep Learning

Event details
Date | 08.12.2023 |
Hour | 13:15 › 14:15 |
Speaker | Fabian Pedregosa |
Location | |
Category | Conferences - Seminars |
Event Language | English |
Gradient descent with large step sizes often exhibits a regime called the Edge of Stability, characterized by an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization near the maximum value (edge of stability). This behavior is inconsistent with several widespread assumptions for optimization, so understanding this phenomenon is crucial to understand and design better training methods. In the first part of the talk I’ll describe empirical results providing evidence for this Edge of Stability phenomenon. In the second part I’ll describe a simple and tractable model consisting of a quartic polynomial that provably exhibits Edge of Stability. Finally, in the third part I’ll present empirical results describing the interplay between sharpness and step-size tuners. Understanding this interplay is crucial for unlocking the full potential of automatic step-size tuners.
Practical information
- Informed public
- Free
Organizer
- Lénaïc Chizat