IC Colloquium: Second-Order Model Compression at Scale: How to Efficiently Run Your 175-Billion Parameter Model on a Single GPU

Event details

Date	03.11.2022
Hour	16:15 › 17:30
Location	BC 420 Online
Category	Conferences - Seminars
Event Language	English

By: Dan Alistarh - IST Austria

Abstract
A key barrier to the wide deployment of highly-accurate machine learning models is their high computational and memory overhead. Although we have the mathematical tools for highly-accurate compression of such models, for instance via the Optimal Brain Surgeon framework (LeCun et al., 1990) and its many extensions, these theoretically-elegant techniques require second-order (curvature) information of the model’s loss function, which is hard to even approximate efficiently at scale.

In this talk, I will describe our work on bridging this computational divide, which enables for the first time accurate second-order pruning and quantization of models at truly massive scale. Our running example will be the 175-Billion-parameter GPT-3/OPT language generation model: compressed using our techniques, it can now be run efficiently on a single GPU, with negligible accuracy loss.

Bio
Dan Alistarh is a Professor at IST Austria, in Vienna. Previously, he was a Researcher with Microsoft, and a Postdoc at MIT CSAIL. He received his PhD from the EPFL, under the brilliant guidance of Prof. Rachid Guerraoui. His research is on algorithms for efficient machine learning and high-performance computing, with a focus on scalable DNN inference and training, for which he was awarded an ERC Starting Grant in 2018. In his spare time, he leads the ML research team at Neural Magic, a startup based in Boston, MA.

More information

Practical information

General public
Free
This event is internal

Contact

Host: Mathieu Salzmann

Export Event

Event broadcasted in

Send a reminder