IC Colloquium: Opening the Black Box: Towards Theoretical Understanding of Deep Learning
 
        Event details
| Date | 15.02.2021 | 
| Hour | 14:00 › 15:00 | 
| Location | Online | 
| Category | Conferences - Seminars | 
      By: Wei Hu - Princeton University
IC Faculty candidate
Abstract
Despite the phenomenal empirical successes of deep learning in many application domains, its underlying mathematical mechanisms remain poorly understood. Mysteriously, deep neural networks in practice can often fit training data perfectly and generalize remarkably well to unseen test data, despite highly non-convex optimization landscapes and significant over-parameterization. Moreover, deep neural networks show extraordinary ability to perform representation learning: feature representation extracted from a trained neural network can be useful for other related tasks.
In this talk, I will present our recent progress on building the theoretical foundations of deep learning, by opening the black box of the interactions among data, model architecture, and training algorithm. First, I will show that gradient descent on deep linear neural networks induces an implicit regularization effect towards low rank, which explains the surprising generalization behavior of deep linear networks for the low-rank matrix completion problem. Next, turning to nonlinear deep neural networks, I will talk about a line of studies on wide neural networks, where by drawing a connection to the neural tangent kernels, we can answer various questions such as how training loss is minimized, why trained network can generalize, and why certain component in the network architecture is useful; we also use theoretical insights to design a new simple and effective method for training on noisily labeled datasets. Finally, I will analyze the statistical aspect of representation learning, and identify key data conditions that enable efficient use of training data, bypassing a known hurdle in the i.i.d. tasks setting.
Bio
Wei Hu is a PhD candidate in the Department of Computer Science at Princeton University, advised by Sanjeev Arora. Previously, he obtained his B.E. in Computer Science from Tsinghua University. He has also spent time as a research intern at research labs of Google and Microsoft. His current research interest is broadly in the theoretical foundations of modern machine learning. In particular, his main focus is on obtaining solid theoretical understanding of deep learning, as well as using theoretical insights to design practical and principled machine learning methods. He is a recipient of the Siebel Scholarship Class of 2021.
More information
    IC Faculty candidate
Abstract
Despite the phenomenal empirical successes of deep learning in many application domains, its underlying mathematical mechanisms remain poorly understood. Mysteriously, deep neural networks in practice can often fit training data perfectly and generalize remarkably well to unseen test data, despite highly non-convex optimization landscapes and significant over-parameterization. Moreover, deep neural networks show extraordinary ability to perform representation learning: feature representation extracted from a trained neural network can be useful for other related tasks.
In this talk, I will present our recent progress on building the theoretical foundations of deep learning, by opening the black box of the interactions among data, model architecture, and training algorithm. First, I will show that gradient descent on deep linear neural networks induces an implicit regularization effect towards low rank, which explains the surprising generalization behavior of deep linear networks for the low-rank matrix completion problem. Next, turning to nonlinear deep neural networks, I will talk about a line of studies on wide neural networks, where by drawing a connection to the neural tangent kernels, we can answer various questions such as how training loss is minimized, why trained network can generalize, and why certain component in the network architecture is useful; we also use theoretical insights to design a new simple and effective method for training on noisily labeled datasets. Finally, I will analyze the statistical aspect of representation learning, and identify key data conditions that enable efficient use of training data, bypassing a known hurdle in the i.i.d. tasks setting.
Bio
Wei Hu is a PhD candidate in the Department of Computer Science at Princeton University, advised by Sanjeev Arora. Previously, he obtained his B.E. in Computer Science from Tsinghua University. He has also spent time as a research intern at research labs of Google and Microsoft. His current research interest is broadly in the theoretical foundations of modern machine learning. In particular, his main focus is on obtaining solid theoretical understanding of deep learning, as well as using theoretical insights to design practical and principled machine learning methods. He is a recipient of the Siebel Scholarship Class of 2021.
More information
Practical information
- General public
- Free
- This event is internal
Contact
- Host: Martin Jaggi