IC Colloquium: Memory-Efficient Adaptive Optimization for Humungous-Scale Learning

Thumbnail

Event details

Date 21.10.2019
Hour 16:1517:30
Location
Category Conferences - Seminars
By: Yoram Singer - Princeton University
Video of his talk

Abstract
Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. I start by giving a general overview of adaptive gradient methods. I then describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of classical adaptive methods. I give convergence guarantees for the method and demonstrate its effectiveness in training some of the largest deep models.

Bio
Yoram Singer is a professor of Computer Science at Princeton University. He was a member of the technical staff at AT&T Research 1995-1999, an associate professor at the Hebrew University 1999-2007, and a Principal Scientist at Google 2005-2019. At Google, he implemented and launched Google’s Domain Spam classifier used for all search queries 2004-2017, co-founded the Sibyl system which served YouTube predictions 2008-2018, founded the Principles Of Effective Machine-learning group, and the Google’s AI Lab at Princeton. He co-chaired COLT’04 and NIPS’04. He is a fellow of AAAI.

More information

Practical information

  • General public
  • Free
  • This event is internal

Contact

  • Host: Martin Jaggi

Share