Gradient Compression Techniques to Accelerate Distributed Training of Neural Networks

Event details
Date | 28.08.2019 |
Hour | 10:30 › 12:30 |
Speaker | Thijs Vogels |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Pascal Frossard
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Babak Falsafi
Abstract
In distributed training of machine learning models with stochastic optimization, the exchange of parameter updates between workers often is a bottleneck that limits the scalability of distributed training. This is especially true for models with a large parameter space, such as neural networks. Several techniques have been proposed to enhance scalability by compressing gradients, e.g. by sending a sparse set of coordinates only, or by quantization. We study the gradient compression literature from both sides: on the one hand, we study properties of these algorithms in a distributed setting, and their effectiveness for speed and scalability. On the other hand, we explore properties of the minima found by these algorithms, such as robustness or generalisation.
Background papers
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, by Alistarh et al. NIPS 2017.
ATOMO: Communication-efficient Learning via Atomic Sparsification, by Wang et al. Neurips 2018.
Error Feedback Fixes SignSGD and other Gradient Compression Schemes, by Karimireddy et al. ICML 2019.
Exam president: Prof. Pascal Frossard
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Babak Falsafi
Abstract
In distributed training of machine learning models with stochastic optimization, the exchange of parameter updates between workers often is a bottleneck that limits the scalability of distributed training. This is especially true for models with a large parameter space, such as neural networks. Several techniques have been proposed to enhance scalability by compressing gradients, e.g. by sending a sparse set of coordinates only, or by quantization. We study the gradient compression literature from both sides: on the one hand, we study properties of these algorithms in a distributed setting, and their effectiveness for speed and scalability. On the other hand, we explore properties of the minima found by these algorithms, such as robustness or generalisation.
Background papers
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, by Alistarh et al. NIPS 2017.
ATOMO: Communication-efficient Learning via Atomic Sparsification, by Wang et al. Neurips 2018.
Error Feedback Fixes SignSGD and other Gradient Compression Schemes, by Karimireddy et al. ICML 2019.
Practical information
- General public
- Free
Contact
- edic@epfl.ch