IC Colloquium: Memory-Efficient Adaptive Optimization for Humungous-Scale Learning
Event details
Date | 21.10.2019 |
Hour | 16:15 › 17:30 |
Location | |
Category | Conferences - Seminars |
By: Yoram Singer - Princeton University
Video of his talk
Abstract
Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. I start by giving a general overview of adaptive gradient methods. I then describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of classical adaptive methods. I give convergence guarantees for the method and demonstrate its effectiveness in training some of the largest deep models.
Bio
Yoram Singer is a professor of Computer Science at Princeton University. He was a member of the technical staff at AT&T Research 1995-1999, an associate professor at the Hebrew University 1999-2007, and a Principal Scientist at Google 2005-2019. At Google, he implemented and launched Google’s Domain Spam classifier used for all search queries 2004-2017, co-founded the Sibyl system which served YouTube predictions 2008-2018, founded the Principles Of Effective Machine-learning group, and the Google’s AI Lab at Princeton. He co-chaired COLT’04 and NIPS’04. He is a fellow of AAAI.
More information
Video of his talk
Abstract
Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. I start by giving a general overview of adaptive gradient methods. I then describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of classical adaptive methods. I give convergence guarantees for the method and demonstrate its effectiveness in training some of the largest deep models.
Bio
Yoram Singer is a professor of Computer Science at Princeton University. He was a member of the technical staff at AT&T Research 1995-1999, an associate professor at the Hebrew University 1999-2007, and a Principal Scientist at Google 2005-2019. At Google, he implemented and launched Google’s Domain Spam classifier used for all search queries 2004-2017, co-founded the Sibyl system which served YouTube predictions 2008-2018, founded the Principles Of Effective Machine-learning group, and the Google’s AI Lab at Princeton. He co-chaired COLT’04 and NIPS’04. He is a fellow of AAAI.
More information
Practical information
- General public
- Free
- This event is internal
Contact
- Host: Martin Jaggi