Two Analyses of Modern Deep Learning: Graph Neural Networks and Language Model Finetuning

Thumbnail

Event details

Date 27.02.2024
Hour 11:1512:15
Speaker Noam Razin (Tel Aviv University)
Location
Category Conferences - Seminars
Event Language English

The resurgence of deep learning was largely driven by architectures conceived in the 20th century trained using labeled data. In recent years deep learning has undergone paradigm shifts characterized by new architectures and training regimes. Despite the popularity of the new paradigms their theoretical understanding is limited. In this talk I will present two recent works analyzing aspects of modern deep learning. The first work considers the expressive power of graph neural networks and formally quantifies their ability to model interactions between vertices. As a practical application of the theory I will introduce a simple edge sparsification algorithm that achieves state-of-the-art results. The second work identifies a fundamental vanishing gradients problem that occurs when using reinforcement learning to finetune language models. I will demonstrate the detrimental effects of this phenomenon and present possible solutions. Lastly I will conclude with an outlook on important questions raised by the advent of foundation models and possible tools for addressing them.

Works covered in the talk were in collaboration with Nadav Cohen Tom Verbin Hattie Zhou Omid Saremi Vimal Thilak Arwen Bradley Preetum Nakkiran Joshua Susskind and Etai Littwin.