BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Memento EPFL//
BEGIN:VEVENT
SUMMARY:External FLAIR seminar: Yuhai Tu
DTSTART:20221021T131500
DTEND:20221021T141500
DTSTAMP:20260510T075552Z
UID:73aa27302ecae3edda7dd566c35ffc615557f96518864c9bd0cdccf5
CATEGORIES:Conferences - Seminars
DESCRIPTION:Yuhai Tu\nTitle: Can physicists help understand Deep Learning
 ?\nSpeaker: Yuhai Tu (IBM T. J. Watson Research Center)\n\nAbstract: De
 spite the great success of deep learning\, it remains largely a black box.
  In this seminar\, we will describe our recent work in understanding lea
 rning dynamics and generalization of deep neural networks based on concept
 s and tools from statistical physics. \n \n(1) SGD Learning dynamics:
  The main search engine in deep neural networks is the Stochastic Gradien
 t Descent (SGD) algorithm\, however\, little is known about how SGD finds 
 ``good" solutions (low generalization error) in the high-dimensional weigh
 t space. By studying weight fluctuations in SGD\, we find a robust inver
 se relation between the weight variance in SGD and the landscape flatness\
 , which is the opposite to the fluctuation-dissipation (response) relati
 on in equilibrium statistical physics. We show that the noise strength in
  SGD depends inversely on the landscape flatness\, which explains the inve
 rse variance-flatness relation. Our study suggests that SGD serves as an `
 `intelligent" annealing strategy where the effective temperature self-adju
 sts according to the loss landscape\, which allows it to find the flat mi
 nimum regions that contain generalizable solutions. Finally\, we discuss a
 n application of these insights for reducing catastrophic forgetting effic
 iently for sequential multiple tasks learning [1].\n \n(2) Geometric de
 terminants of generalization: We first report the discovery of duality
  relations between changes in activities in a densely connected layer of n
 eurons and the changes in their weights connecting to the next layer. The 
 activity-weight duality leads to an explicit expression for the generaliza
 tion loss\, which can be decomposed into contributions from different dire
 ctions in weight space. We find that the generalization loss from each dir
 ection is the product of two geometric factors (determinants): sharpness o
 f the loss landscape at the solution and the standard deviation of the dua
 l weights\, which scales as an activity-weighted norm of the solution. By 
 using the generalization loss decomposition\, we uncover how hyperparamete
 rs in SGD\, different regularization schemes (e.g.\, weight decay and drop
 out)\, training data size\, and labeling noise affect generalization by co
 ntrolling one or both factors [2].\n \n \n[1] “The inverse variance-f
 latness relation in Stochastic-Gradient-Descent is critical for finding f
 lat minima”\, Y. Feng and Y. Tu\, PNAS\, 118 (9)\, 2021.\n \n[2] “The
  activity-weight duality in feed forward neural networks: The geometric de
 terminants of generalization”\, Y. Feng and Y. Tu\, https://arxiv.org/a
 bs/2203.10736
LOCATION:GA 3 21 https://plan.epfl.ch/?room==GA%203%2021
STATUS:CONFIRMED
END:VEVENT
END:VCALENDAR
