Gradient Compression Techniques to Accelerate Distributed Training of Neural Networks

Thumbnail

Event details

Date 28.08.2019
Hour 10:3012:30
Speaker Thijs Vogels
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Pascal Frossard
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Babak Falsafi

Abstract
In distributed training of machine learning models with stochastic optimization, the exchange of parameter updates between workers often is a bottleneck that limits the scalability of distributed training. This is especially true for models with a large parameter space, such as neural networks. Several techniques have been proposed to enhance scalability by compressing gradients, e.g. by sending a sparse set of coordinates only, or by quantization. We study the gradient compression literature from both sides: on the one hand, we study properties of these algorithms in a distributed setting, and their effectiveness for speed and scalability. On the other hand, we explore properties of the minima found by these algorithms, such as robustness or generalisation.

Background papers
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, by Alistarh et al. NIPS 2017.
ATOMO: Communication-efficient Learning via Atomic Sparsification, by Wang et al. Neurips 2018.
Error Feedback Fixes SignSGD and other Gradient Compression Schemes, by Karimireddy et al. ICML 2019. 
 

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share