AI Center Seminar - AI Fundamentals series - Dr. Denny Wu
The talk is jointly organized by the EPFL Foundations of Learning and AI Research (FLAIR) group and the EPFL AI Center.
Hosting professor: Prof. Lenka Zdeborova
For logistics purposes, please register here (using your EPFL email address): HERE.
The talk will be followed by a coffee session.
Title
Learning shallow neural networks in high dimensions: SGD dynamics and scaling laws
Abstract
We study the sample and time complexity of online stochastic gradient descent (SGD) in learning a two-layer neural network with M orthogonal neurons on isotropic Gaussian data. We focus on the challenging “extensive-width” regime M>>1 and allow for large condition number in the second-layer parameters, covering the power-law scaling a_m= m^{-\beta} as a special case. We characterize the SGD dynamics for the training of a student two-layer neural network and identify sharp transition times for the recovery of each signal direction. In the power-law setting, our analysis entails that while the learning of individual teacher neurons exhibits abrupt phase transitions, the juxtaposition of emergent learning curves at different timescales results in a smooth scaling law in the cumulative objective.
Bio
Denny Wu is a Faculty Fellow at the Center for Data Science, New York University and the Center for Computational Mathematics, Flatiron Institute. He was a Ph.D. student at the University of Toronto and the Vector Institute, advised by Jimmy Ba and Murat A. Erdogdu. Prior to that, he completed his undergraduate studies at Carnegie Mellon University supervised by Ruslan Salakhutdinov.
Links
Practical information
- Informed public
- Registration required
Organizer
- FLAIR Group & EPFL AI Center