Feature learning, lower-homogeneity, and normalization layers

Event details

Date	21.03.2024
Hour	11:15 › 12:15
Speaker	Matus Telgarsky
Location	GA 3 21
Category	Conferences - Seminars
Event Language	English

The first half of this talk will describe the feature learning problem in deep learning optimization, its statistical consequences, and an approach to proving general theorems with a heavy reliance on normalization layers, which are common to all modern architectures but typically treated as an analytic nuisance. Theorems will cover two settings: concrete results for shallow networks, and abstract template theorems for general architectures. The shallow network results allow for globally maximal margins at the cost of large width and no further assumptions, while the general architecture theorems give convergence rates to KKT points for a new general class of architectures satisfying "partial lower-homogeneity".
The second half will be technical, demonstrating two core proof techniques. The first ingredient, essential to the shallow analysis, is a new mirror descent lemma, strengthening a beautiful idea discovered by Chizat and Bach. The second ingredient is the concept of "partial lower-homogeneity" and its consequences.

Joint work with Danny Son; not currently on arXiv, but "coming soon".

Practical information

Informed public
Free

Organizer

Lénaïc Chizat

Contact

[email protected]

Export Event

Event broadcasted in

Send a reminder