Decentralized Stochastic Optimization

Thumbnail

Event details

Date 26.08.2019
Hour 14:0016:00
Speaker Anastasiia Koloskova
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Volkan Cevher
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Ali Sayed

Abstract
Decentralized optimization is a promising direction for optimizing machine learning models. It allows to distribute training over large amount of computing devices (such as e.g. mobile phones) without moving the users data to central servers. Moreover it can give significant speedups for training in datacenters over all-reduce SGD, which is the current state-ofthe-art parallel SGD implementation. In this writeup we discuss some of the recent advances in decentralized optimization and its current weaknesses. We firstly consider communication compression techniques for speeding up centralized training. The second paper we discuss, shows that communication topology does not influence the leading term in convergence rate in stochastic decentralized optimization, thus making it competitive with centralized approaches. And finally, we consider another technique to make communication more efficient in decentralized training—time-varying directed network graphs and asynchronous communications.

Background papers
QSGD: Communication-efficient SGD via gradient quantization and encoding, by Alistarh, D., et al. NIPS 2017.
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent, by Lian, X., et al. NIPS 2017.
Stochastic Gradient Push for Distributed Deep Learning, by Assran, M., et al. ICML 2019.

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share