Theoretical analysis of multiple descents in neural networks through random matrix theory

Event details

Date	25.08.2022
Hour	14:00 › 16:00
Speaker	Anastasia Remizova
Location	BC 129
Category	Conferences - Seminars

EDIC candidacy exam

Exam president: Prof. Olivier Lévêque
Thesis advisor: Prof. Nicolas Macris
Co-examiner: Dr. Yanina Shkel

Abstract
While artificial neural networks are wildly successful, theoretical understanding of them is quite limited. One of the oddities is the phenomenon of double descent: increasing model complexity results first in a decrease in test risk, then in an increase due to overfitting, and, lastly, in a decrease again in the over-parametrized regime. These observations lead to a line of research focusing on exploring double descent dynamics in the models approximating neural networks.
This write-up focuses on three different works relevant to this area. The first one presents a range of experiments showcasing double descent in deep networks in various settings. The second one introduces a method to compute the spectra of a large block Gaussian matrix. This result is used in the third work which analyzes the test risk by describing asymptotics of a 2-layer neural network with the Neural Tangent Kernel regression and discovers cases of multiple descent.

Background papers

Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., & Sutskever, I. (2020, April). Deep Double Descent: Where Bigger Models and More Data Hurt. In 8th International Conference on Learning Representations,{ICLR} 2020.
https://par.nsf.gov/servlets/purl/10204300 Base paper (w/o supplementary) - 12 pages.
Far, R. R., Oraby, T., Bryc, W., & Speicher, R. Spectra of large block matrices https://mast.queensu.ca/~speicher/papers/block.pdf
Chapters 1, 2, 3, two examples from chapter 5 (e.g. 5.1 and 5.2), 6 - around 22 pages. Chapter 4 contains a proof of the theorem, knowledge of the proof is not required to understand other parts of the paper.
Adlam, B., & Pennington, J. (2020). The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization. arXiv preprint arXiv:2008.06786.https://arxiv.org/abs/2008.06786 .Base paper + chapters S1, S2, and S3 from supplementary - 19 pages.

Practical information

General public
Free

Contact

[email protected]

Export Event

Event broadcasted in

Send a reminder