From Theoretical Understanding of Neural Networks to Practical Applications

Event details
Date | 10.07.2023 |
Hour | 10:30 › 12:30 |
Speaker | Yongtao Wu |
Location | |
Category | Conferences - Seminars |
DIC candidacy exam
Exam president: Prof. Nicolas Flammarion
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Martin Jaggi
Abstract
Deep learning has demonstrated unprecedented success in influential applications ranging from vision tasks to language modeling. The design of network architecture plays a pivotal role in its performance, as evident from the development of ResNet, EfficientNet, and Transformer. These achievements have ignited a profound interest in theoretically understanding neural networks across various topics, such as convergence, generalization, and learnability, which can also significantly contribute to practical applications. In this write-up, we first delve into the convergence of feedforward neural networks. Subsequently, we will examine a study on Transformer from the perspective of generalization. Lastly, we will introduce a theoretical work on in-context learning within the Transformer model.
Background papers
Exam president: Prof. Nicolas Flammarion
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Martin Jaggi
Abstract
Deep learning has demonstrated unprecedented success in influential applications ranging from vision tasks to language modeling. The design of network architecture plays a pivotal role in its performance, as evident from the development of ResNet, EfficientNet, and Transformer. These achievements have ignited a profound interest in theoretically understanding neural networks across various topics, such as convergence, generalization, and learnability, which can also significantly contribute to practical applications. In this write-up, we first delve into the convergence of feedforward neural networks. Subsequently, we will examine a study on Transformer from the perspective of generalization. Lastly, we will introduce a theoretical work on in-context learning within the Transformer model.
Background papers
- 'Gradient Descent Provably Optimizes Over-parameterized Neural Networks', Du et al, ICLR, 2019.
- 'A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity', Li et al, ICLR, 2023.
- 'Transformers learn in-context by gradient descent', Oswald et al, ICML, 2023.
Practical information
- General public
- Free