Advances in Mathematics of Deep Learning

Event details

Date	07.07.2023
Hour	13:00 › 15:00
Speaker	Hristo Papazov
Location	BC 233
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Nicolas Boumal
Thesis advisor: Prof. Nicolas Flammarion
Co-examiner: Prof. Lénaïc Chizat

Abstract
In this research proposal, we will be considering the question of how -- for a fixed objective function, model class, and optimization algorithm -- different parametrizations of the model and different initializations of the parameters influence the optimization trajectory and the generalization properties of the learning procedure. In other words, we will be investigating the implicit bias induced by the choice of parametrization and initialization, and the following three papers will guide our discussion:
    -- "An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function" by Lemaire;
    -- "Kernel and Rich Regimes in Overparametrized Models" by Woodworth et al.;
    -- "Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent" by Li et al.
As a rough roadmap for the report, we will first develop some foundational tools for analyzing the trajectory and limit of convex gradient flow through Lemaire. Then, we will study a certain class of reparametrized linear models from Woodworth et al. where the geometry and scale of the initialized parameters lead to distinct optimization and generalization behavior. In this setting, we cannot directly apply the techniques from Lemaire because we are optimizing over a nonconvex loss. However, fortunately, the optimization procedure can be rephrased as a convex mirror flow, for which all of the tricks from Lemaire carry over. Finally, we consider Li et al. who give precise necessary and sufficient conditions for when a gradient flow on a reparametrized model can be reformulated as a mirror flow with a Legendre function on the "effective" parameters so that we can reap the benefits of a convex loss.

Background papers

An Asymptotical Variational Principle Associated with the Steepest Descent Method for a Convex Function: https://www.heldermann-verlag.de/jca/jca03/jca03005.pdf
Kernel and Rich Regimes in Overparametrized Models: https://arxiv.org/abs/2002.09277
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent: https://arxiv.org/abs/2207.04036

Practical information

General public
Free

Contact

edic@epfl.ch

Export Event

Event broadcasted in

Send a reminder