From Theoretical Understanding of Neural Networks to Practical Applications

Event details

Date	10.07.2023
Hour	10:30 › 12:30
Speaker	Yongtao Wu
Location	ELD 120
Category	Conferences - Seminars

DIC candidacy exam
Exam president: Prof. Nicolas Flammarion
Thesis advisor: Prof. Volkan Cevher
Co-examiner: Prof. Martin Jaggi

Abstract
Deep learning has demonstrated unprecedented success in influential applications ranging from vision tasks to language modeling. The design of network architecture plays a pivotal role in its performance, as evident from the development of ResNet, EfficientNet, and Transformer. These achievements have ignited a profound interest in theoretically understanding neural networks across various topics, such as convergence, generalization, and learnability, which can also significantly contribute to practical applications. In this write-up, we first delve into the convergence of feedforward neural networks. Subsequently, we will examine a study on Transformer from the perspective of generalization. Lastly, we will introduce a theoretical work on in-context learning within the Transformer model.

Background papers

'Gradient Descent Provably Optimizes Over-parameterized Neural Networks', Du et al, ICLR, 2019.
'A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity', Li et al, ICLR, 2023.
'Transformers learn in-context by gradient descent', Oswald et al, ICML, 2023.

Practical information

General public
Free

Contact

[email protected]

Export Event

Event broadcasted in

Send a reminder