Theoretical analysis of multiple descents in neural networks through random matrix theory

Thumbnail

Event details

Date 25.08.2022
Hour 14:0016:00
Speaker Anastasia Remizova
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Olivier Lévêque
Thesis advisor: Prof. Nicolas Macris
Co-examiner: Dr. Yanina Shkel

Abstract
While artificial neural networks are wildly successful, theoretical understanding of them is quite limited. One of the oddities is the phenomenon of double descent: increasing model complexity results first in a decrease in test risk, then in an increase due to overfitting, and, lastly, in a decrease again in the over-parametrized regime. These observations lead to a line of research focusing on exploring double descent dynamics in the models approximating neural networks.
This write-up focuses on three different works relevant to this area. The first one presents a range of experiments showcasing double descent in deep networks in various settings. The second one introduces a method to compute the spectra of a large block Gaussian matrix. This result is used in the third work which analyzes the test risk by describing asymptotics of a 2-layer neural network with the Neural Tangent Kernel regression and discovers cases of multiple descent.

Background papers
  1.   Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., & Sutskever, I. (2020, April). Deep Double Descent: Where Bigger Models and More Data Hurt. In 8th International Conference on Learning Representations,{ICLR} 2020.
    https://par.nsf.gov/servlets/purl/10204300 Base paper (w/o supplementary) - 12 pages.
     
  2. Far, R. R., Oraby, T., Bryc, W., & Speicher, R. Spectra of large block matrices https://mast.queensu.ca/~speicher/papers/block.pdf
    Chapters 1, 2, 3, two examples from chapter 5 (e.g. 5.1 and 5.2), 6 - around 22 pages. Chapter 4 contains a proof of the theorem, knowledge of the proof is not required to understand other parts of the paper.
     
  3. Adlam, B., & Pennington, J. (2020). The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization. arXiv preprint arXiv:2008.06786.https://arxiv.org/abs/2008.06786 .Base paper + chapters S1, S2, and S3 from supplementary - 19 pages.

Practical information

  • General public
  • Free

Contact

  • edic@epfl.ch

Tags

EDIC candidacy exam

Share