Theoretical analysis of multiple descents in neural networks through random matrix theory

Event details
Date | 25.08.2022 |
Hour | 14:00 › 16:00 |
Speaker | Anastasia Remizova |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Olivier Lévêque
Thesis advisor: Prof. Nicolas Macris
Co-examiner: Dr. Yanina Shkel
Abstract
While artificial neural networks are wildly successful, theoretical understanding of them is quite limited. One of the oddities is the phenomenon of double descent: increasing model complexity results first in a decrease in test risk, then in an increase due to overfitting, and, lastly, in a decrease again in the over-parametrized regime. These observations lead to a line of research focusing on exploring double descent dynamics in the models approximating neural networks.
This write-up focuses on three different works relevant to this area. The first one presents a range of experiments showcasing double descent in deep networks in various settings. The second one introduces a method to compute the spectra of a large block Gaussian matrix. This result is used in the third work which analyzes the test risk by describing asymptotics of a 2-layer neural network with the Neural Tangent Kernel regression and discovers cases of multiple descent.
Background papers
Thesis advisor: Prof. Nicolas Macris
Co-examiner: Dr. Yanina Shkel
Abstract
While artificial neural networks are wildly successful, theoretical understanding of them is quite limited. One of the oddities is the phenomenon of double descent: increasing model complexity results first in a decrease in test risk, then in an increase due to overfitting, and, lastly, in a decrease again in the over-parametrized regime. These observations lead to a line of research focusing on exploring double descent dynamics in the models approximating neural networks.
This write-up focuses on three different works relevant to this area. The first one presents a range of experiments showcasing double descent in deep networks in various settings. The second one introduces a method to compute the spectra of a large block Gaussian matrix. This result is used in the third work which analyzes the test risk by describing asymptotics of a 2-layer neural network with the Neural Tangent Kernel regression and discovers cases of multiple descent.
Background papers
- Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., & Sutskever, I. (2020, April). Deep Double Descent: Where Bigger Models and More Data Hurt. In 8th International Conference on Learning Representations,{ICLR} 2020.
https://par.nsf.gov/servlets/purl/10204300 Base paper (w/o supplementary) - 12 pages.
- Far, R. R., Oraby, T., Bryc, W., & Speicher, R. Spectra of large block matrices https://mast.queensu.ca/~speicher/papers/block.pdf
Chapters 1, 2, 3, two examples from chapter 5 (e.g. 5.1 and 5.2), 6 - around 22 pages. Chapter 4 contains a proof of the theorem, knowledge of the proof is not required to understand other parts of the paper.
- Adlam, B., & Pennington, J. (2020). The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization. arXiv preprint arXiv:2008.06786.https://arxiv.org/abs/2008.06786 .Base paper + chapters S1, S2, and S3 from supplementary - 19 pages.
Practical information
- General public
- Free
Contact
- edic@epfl.ch