FLAIR seminar: The Mysterious Optimization Dynamics of Deep Learning

Thumbnail

Event details

Date 08.12.2023
Hour 13:1514:15
Speaker Fabian Pedregosa
Location
Category Conferences - Seminars
Event Language English

Gradient descent with large step sizes often exhibits a regime called the Edge of Stability, characterized by an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization near the maximum value (edge of stability). This behavior is inconsistent with several widespread assumptions for optimization, so understanding this phenomenon is crucial to understand and design better training methods. In the first part of the talk I’ll describe empirical results providing evidence for this Edge of Stability phenomenon. In the second part I’ll describe a simple and tractable model consisting of a quartic polynomial that provably exhibits Edge of Stability. Finally, in the third part I’ll present empirical results describing the interplay between sharpness and step-size tuners. Understanding this interplay is crucial for unlocking the full potential of automatic step-size tuners.

Practical information

  • Informed public
  • Free

Organizer

  • Lénaïc Chizat

Share