Talk of Professor Dan Alistarh (ISTA)

Event details

Date	30.08.2024
Hour	11:00 › 12:00
Speaker	Professor Dan Alistarh
Location	MED 0 1418
Category	Conferences - Seminars
Event Language	English

Title: Accurate Model Compression at GPT Scale

Abstract: A key barrier to the wide deployment of highly-accurate machine learning models, whether for language or vision, is their high computational and memory overhead. Although we possess the mathematical tools for highly-accurate compression of such models, these theoretically-elegant techniques require second-order information of the model’s loss function, which is hard to even approximate efficiently at the scale of billion-parameter models. In this talk, I will describe our work on bridging this computational divide, which enables the accurate second-order pruning and quantization of models at truly massive scale. Compressed using our techniques, models with billions and even trillions of parameters can be executed efficiently on a few GPUs, with significant speedups, and negligible accuracy loss. Models created using our techniques have been downloaded millions of times from open-source repositories such as HuggingFace.

Bio: Dan Alistarh is a Professor at ISTA. Previously, he was affiliated with ETH Zurich, MIT, and Microsoft Research, having received his PhD from the EPFL under the guidance of Rachid Guerraoui. His research is on algorithms for efficient machine learning and high-performance computing, with a focus on scalable DNN inference and training, for which he was awarded ERC Starting and Proof-of-Concept Grants. In his spare time, he works with the ML research team at Neural Magic, a startup based in Boston, on making compression faster, more accurate and accessible to practitioners.

Practical information

Informed public
Free

Organizer

Professor Volkan Cevher

Contact

Gosia Baltaian

Export Event

Event broadcasted in

Send a reminder