Feature learning, lower-homogeneity, and normalization layers

Event details
Date | 21.03.2024 |
Hour | 11:15 › 12:15 |
Speaker | Matus Telgarsky |
Location | |
Category | Conferences - Seminars |
Event Language | English |
The first half of this talk will describe the feature learning problem in deep learning optimization, its statistical consequences, and an approach to proving general theorems with a heavy reliance on normalization layers, which are common to all modern architectures but typically treated as an analytic nuisance. Theorems will cover two settings: concrete results for shallow networks, and abstract template theorems for general architectures. The shallow network results allow for globally maximal margins at the cost of large width and no further assumptions, while the general architecture theorems give convergence rates to KKT points for a new general class of architectures satisfying "partial lower-homogeneity".
The second half will be technical, demonstrating two core proof techniques. The first ingredient, essential to the shallow analysis, is a new mirror descent lemma, strengthening a beautiful idea discovered by Chizat and Bach. The second ingredient is the concept of "partial lower-homogeneity" and its consequences.
Joint work with Danny Son; not currently on arXiv, but "coming soon".
Practical information
- Informed public
- Free
Organizer
- Lénaïc Chizat
Contact
- lenaic.chizat@epfl.ch