Gradient Compression Techniques to Accelerate Distributed Training of Neural Networks

Event details

Date	28.08.2019
Hour	10:30 › 12:30
Speaker	Thijs Vogels
Location	BC 129
Category	Conferences - Seminars

EDIC candidacy exam
Exam president: Prof. Pascal Frossard
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Babak Falsafi

Abstract
In distributed training of machine learning models with stochastic optimization, the exchange of parameter updates between workers often is a bottleneck that limits the scalability of distributed training. This is especially true for models with a large parameter space, such as neural networks. Several techniques have been proposed to enhance scalability by compressing gradients, e.g. by sending a sparse set of coordinates only, or by quantization. We study the gradient compression literature from both sides: on the one hand, we study properties of these algorithms in a distributed setting, and their effectiveness for speed and scalability. On the other hand, we explore properties of the minima found by these algorithms, such as robustness or generalisation.

Background papers
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, by Alistarh et al. NIPS 2017.
ATOMO: Communication-efficient Learning via Atomic Sparsification, by Wang et al. Neurips 2018.
Error Feedback Fixes SignSGD and other Gradient Compression Schemes, by Karimireddy et al. ICML 2019.

Practical information

General public
Free

Contact

edic@epfl.ch

Export Event

Event broadcasted in

Send a reminder