Decentralized Stochastic Optimization
Event details
Date | 26.08.2019 |
Hour | 14:00 › 16:00 |
Speaker | Anastasiia Koloskova |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Volkan Cevher
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Ali Sayed
Abstract
Decentralized optimization is a promising direction for optimizing machine learning models. It allows to distribute training over large amount of computing devices (such as e.g. mobile phones) without moving the users data to central servers. Moreover it can give significant speedups for training in datacenters over all-reduce SGD, which is the current state-ofthe-art parallel SGD implementation. In this writeup we discuss some of the recent advances in decentralized optimization and its current weaknesses. We firstly consider communication compression techniques for speeding up centralized training. The second paper we discuss, shows that communication topology does not influence the leading term in convergence rate in stochastic decentralized optimization, thus making it competitive with centralized approaches. And finally, we consider another technique to make communication more efficient in decentralized training—time-varying directed network graphs and asynchronous communications.
Background papers
QSGD: Communication-efficient SGD via gradient quantization and encoding, by Alistarh, D., et al. NIPS 2017.
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent, by Lian, X., et al. NIPS 2017.
Stochastic Gradient Push for Distributed Deep Learning, by Assran, M., et al. ICML 2019.
Exam president: Prof. Volkan Cevher
Thesis advisor: Prof. Martin Jaggi
Co-examiner: Prof. Ali Sayed
Abstract
Decentralized optimization is a promising direction for optimizing machine learning models. It allows to distribute training over large amount of computing devices (such as e.g. mobile phones) without moving the users data to central servers. Moreover it can give significant speedups for training in datacenters over all-reduce SGD, which is the current state-ofthe-art parallel SGD implementation. In this writeup we discuss some of the recent advances in decentralized optimization and its current weaknesses. We firstly consider communication compression techniques for speeding up centralized training. The second paper we discuss, shows that communication topology does not influence the leading term in convergence rate in stochastic decentralized optimization, thus making it competitive with centralized approaches. And finally, we consider another technique to make communication more efficient in decentralized training—time-varying directed network graphs and asynchronous communications.
Background papers
QSGD: Communication-efficient SGD via gradient quantization and encoding, by Alistarh, D., et al. NIPS 2017.
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent, by Lian, X., et al. NIPS 2017.
Stochastic Gradient Push for Distributed Deep Learning, by Assran, M., et al. ICML 2019.
Practical information
- General public
- Free