Advances in Mathematics of Deep Learning

Event details
Date | 07.07.2023 |
Hour | 13:00 › 15:00 |
Speaker | Hristo Papazov |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Nicolas Boumal
Thesis advisor: Prof. Nicolas Flammarion
Co-examiner: Prof. Lénaïc Chizat
Abstract
In this research proposal, we will be considering the question of how -- for a fixed objective function, model class, and optimization algorithm -- different parametrizations of the model and different initializations of the parameters influence the optimization trajectory and the generalization properties of the learning procedure. In other words, we will be investigating the implicit bias induced by the choice of parametrization and initialization, and the following three papers will guide our discussion:
-- "An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function" by Lemaire;
-- "Kernel and Rich Regimes in Overparametrized Models" by Woodworth et al.;
-- "Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent" by Li et al.
As a rough roadmap for the report, we will first develop some foundational tools for analyzing the trajectory and limit of convex gradient flow through Lemaire. Then, we will study a certain class of reparametrized linear models from Woodworth et al. where the geometry and scale of the initialized parameters lead to distinct optimization and generalization behavior. In this setting, we cannot directly apply the techniques from Lemaire because we are optimizing over a nonconvex loss. However, fortunately, the optimization procedure can be rephrased as a convex mirror flow, for which all of the tricks from Lemaire carry over. Finally, we consider Li et al. who give precise necessary and sufficient conditions for when a gradient flow on a reparametrized model can be reformulated as a mirror flow with a Legendre function on the "effective" parameters so that we can reap the benefits of a convex loss.
Background papers
Exam president: Prof. Nicolas Boumal
Thesis advisor: Prof. Nicolas Flammarion
Co-examiner: Prof. Lénaïc Chizat
Abstract
In this research proposal, we will be considering the question of how -- for a fixed objective function, model class, and optimization algorithm -- different parametrizations of the model and different initializations of the parameters influence the optimization trajectory and the generalization properties of the learning procedure. In other words, we will be investigating the implicit bias induced by the choice of parametrization and initialization, and the following three papers will guide our discussion:
-- "An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function" by Lemaire;
-- "Kernel and Rich Regimes in Overparametrized Models" by Woodworth et al.;
-- "Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent" by Li et al.
As a rough roadmap for the report, we will first develop some foundational tools for analyzing the trajectory and limit of convex gradient flow through Lemaire. Then, we will study a certain class of reparametrized linear models from Woodworth et al. where the geometry and scale of the initialized parameters lead to distinct optimization and generalization behavior. In this setting, we cannot directly apply the techniques from Lemaire because we are optimizing over a nonconvex loss. However, fortunately, the optimization procedure can be rephrased as a convex mirror flow, for which all of the tricks from Lemaire carry over. Finally, we consider Li et al. who give precise necessary and sufficient conditions for when a gradient flow on a reparametrized model can be reformulated as a mirror flow with a Legendre function on the "effective" parameters so that we can reap the benefits of a convex loss.
Background papers
- An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function: https://www.heldermann-verlag.de/jca/jca03/jca03005.pdf
- Kernel and Rich Regimes in Overparametrized Models: https://arxiv.org/abs/2002.09277
- Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent: https://arxiv.org/abs/2207.04036
Practical information
- General public
- Free
Contact
- edic@epfl.ch