A new Approach to Gradient-based Convex and Non-convex Optimization: Beyond Lipschitz Gradients

Thumbnail

Event details

Date 28.04.2023
Hour 11:0012:00
Speaker ProfessorAli Jadbabaie, Massachusetts institute of Technology, USA
Location
Category Conferences - Seminars
Event Language English
Abstract
Classical textbook analysis of convex and non-convex optimization problems often  assumes that gradients are Lipschitz continuous, thereby mostly restricting the analysis to objective functions bounded by quadratics. In a recent work[1], we generalized this standard smoothness to the setting where the Lipschitz constant of the gradient (or alternatively norm of the Hessian when it exists) is bounded by a positive, affine function of the gradient norm. The class of such functions is far more general than quadratics and contains (univariate) polynomials and exponential functions as special cases. In this talk, we revisit this generalized smoothness condition and use it to  develop a simple yet powerful analysis technique that allows us to obtain much stronger results for both convex and non-convex optimization. This new approach is based on carefully bounding the gradient norm along the optimization trajectory. Using this, we show that in the convex setting, gradient descent (GD) converges under a very general smoothness condition: all that is needed is for  the Hessian to be bounded by a continuous function of the gradient norm.  In the non-convex setting, when the local smoothness  (or Hessian norm) is a sub-quadratic function of the gradient norm, we show that GD, SGD with constant stepsize, and Adam [a popular adaptive approach used in Deep Learning] provably converge under very general settings. For SGD, we show that under this more general smoothness condition,  the standard assumption of bounded noise in previous works can be relaxed to merely boundedness of the  noise variance. In the case of Adam, we show convergence without assuming the Lipschitzness of the objective function. Finally, we propose a variance-reduced version of Adam with an accelerated convergence rate that is tight. For all the above-mentioned settings and algorithms, we prove convergence rates that match the state of the art without restrictive assumptions. [Joint work with PhD student Haochuan Li]

[1] Zhang, J., He, T., Sra, S., & Jadbabaie, A. Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity. In International Conference on Learning Representations.ICLR 2020


Biography

Ali Jadbabaie is the JR East Professor and Head of the Department of Civil and Environmental Engineering at Massachusetts Institute of Technology (MIT), Cambridge, MA, USA, where he is also a core faculty in the Institute for Data, Systems, and Society (IDSS) and a Principal Investigator with the Laboratory for Information and Decision Systems. Previously, he served as the Director of the Sociotechnical Systems Research Center and as the Associate Director of the IDSS, MIT, which he helped found in 2015. He is the co-creator of the Social and Engineering Systems PhD program within IDSS, an interdisciplinary PhD program that combines information sciences and social sciences to solve complex societal problems in different domains.
He received a B.S. degree with High Honors in electrical engineering with a focus on control systems from the Sharif University of Technology, an M.S. degree in electrical and computer engineering from the University of New Mexico, and a Ph.D. degree in control and dynamical systems from the California Institute of Technology. He was a Postdoctoral Scholar at Yale University before joining the faculty at the University of Pennsylvania, where he was subsequently promoted through the ranks and held the Alfred Fitler Moore Professorship in network science in the Department of Electrical and Systems Engineering.   He is a recipient of a US National Science Foundation Career Development Award, an US Office of Naval Research Young Investigator Award, the O. Hugo Schuck Best Paper Award from the American Automatic Control Council, and the George S. Axelby Best Paper Award from the IEEE Control Systems Society. He has been a senior author of several student best paper awards, in several conference including ACC, IEEE CDC and IEEE ICASSP. He is an IEEE fellow, and the recipient of a Vannevar Bush Fellowship from the Office of Secretary of Defense.  His research interests are broadly in systems theory, decision theory and control,  optimization theory, as well as multiagent systems, collective decision making in social and economic networks, and computational social science. He has served as the inaugural Editor-in-Chief of the IEEE Transactions on network Science and Engineering, an interdisciplinary journal sponsored by several IEEE societies, and as an associate editor for the Informs journal  Operations Research.