Mechanisms of Learning in Neural Networks: Scaling, Dynamics, and Optimization

Event details
Date | 11.06.2025 |
Hour | 14:00 › 16:00 |
Speaker | Yizhou Xu |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Nicolas Macris
Thesis advisor: Prof. Lenka Zdeborová
Co-examiner: Prof. Lénaïc Chizat
Abstract
This report reviews three recent advances in the high-dimensional analysis of deep learning, focusing on optimization, learning dynamics, and generalization. First, [1] introduces a belief propagation-based algorithm for training discrete neural networks, offering an alternative to gradient-based methods. Second, [2] characterizes the training dynamics and the emergence of task specialization in multi-head attention during in-context learning. Third, [3] derives scaling laws of the generalization error for random feature regression, establishing the deterministic equivalence of infinite-width models. While these works exemplify how high-dimensional limits can yield tractable asymptotics for neural networks, significant gaps remain between theoretical settings and practical architectures. We conclude by outlining open questions to bridge these gaps.
Selected papers
1. Deep learning via message passing algorithms based on belief propagation (https://iopscience.iop.org/article/10.1088/2632-2153/ac7d3b/pdf)
2. Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality (https://arxiv.org/abs/2402.19442) and
3. Dimension-free deterministic equivalents and scaling laws for random feature regression (https://arxiv.org/pdf/2405.15699)
Exam president: Prof. Nicolas Macris
Thesis advisor: Prof. Lenka Zdeborová
Co-examiner: Prof. Lénaïc Chizat
Abstract
This report reviews three recent advances in the high-dimensional analysis of deep learning, focusing on optimization, learning dynamics, and generalization. First, [1] introduces a belief propagation-based algorithm for training discrete neural networks, offering an alternative to gradient-based methods. Second, [2] characterizes the training dynamics and the emergence of task specialization in multi-head attention during in-context learning. Third, [3] derives scaling laws of the generalization error for random feature regression, establishing the deterministic equivalence of infinite-width models. While these works exemplify how high-dimensional limits can yield tractable asymptotics for neural networks, significant gaps remain between theoretical settings and practical architectures. We conclude by outlining open questions to bridge these gaps.
Selected papers
1. Deep learning via message passing algorithms based on belief propagation (https://iopscience.iop.org/article/10.1088/2632-2153/ac7d3b/pdf)
2. Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality (https://arxiv.org/abs/2402.19442) and
3. Dimension-free deterministic equivalents and scaling laws for random feature regression (https://arxiv.org/pdf/2405.15699)
Practical information
- General public
- Free