Scaling Distributed Deep Learning with Efficient Algorithm Design
Event details
Date | 13.06.2018 |
Hour | 10:00 › 12:00 |
Speaker | Tao LIN |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. François Fleuret
Thesis advisor: Prof. Martin Jaggi
Thesis co-advisor: Prof. Babak Falsafi
Co-examiner: Prof. Rachid Guerraoui
Abstract
Due to the rapid growth of data and the ever-increasing model complexity, today, most important deep learning algorithms cannot be efficiently solved by a single machine. Distributed training architectures for training have been developed in response to the challenges, and promise improved scalability by increasing both computational and storage capacities. A critical challenge in realizing this promise of scalability is to develop efficient methods for communicating and coordinating information between distributed devices, taking into account the specific needs of machine learning training algorithms. On most distributed systems, the communication of information between devices is vastly more expensive than reading data from main memory and performing the local computation. Moreover, the optimal trade-off between communication and computation can vary widely depending on the dataset being processed, the available system resources being used, and the training objective being optimized. In this thesis, we try to address the above-mentioned challenge, for the improvement of scalability of learning systems.
Background papers
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, by Alistarh, Dan, et al. Advances in Neural Information Processing Systems. 2017.
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training, by Lin, Yujun, et al. arXiv preprint arXiv:1712.01887 (2017).
Accurate, large minibatch SGD: training imagenet in 1 hour, by Goyal, Priya, et al. arXiv preprint arXiv:1706.02677 (2017).
Exam president: Prof. François Fleuret
Thesis advisor: Prof. Martin Jaggi
Thesis co-advisor: Prof. Babak Falsafi
Co-examiner: Prof. Rachid Guerraoui
Abstract
Due to the rapid growth of data and the ever-increasing model complexity, today, most important deep learning algorithms cannot be efficiently solved by a single machine. Distributed training architectures for training have been developed in response to the challenges, and promise improved scalability by increasing both computational and storage capacities. A critical challenge in realizing this promise of scalability is to develop efficient methods for communicating and coordinating information between distributed devices, taking into account the specific needs of machine learning training algorithms. On most distributed systems, the communication of information between devices is vastly more expensive than reading data from main memory and performing the local computation. Moreover, the optimal trade-off between communication and computation can vary widely depending on the dataset being processed, the available system resources being used, and the training objective being optimized. In this thesis, we try to address the above-mentioned challenge, for the improvement of scalability of learning systems.
Background papers
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, by Alistarh, Dan, et al. Advances in Neural Information Processing Systems. 2017.
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training, by Lin, Yujun, et al. arXiv preprint arXiv:1712.01887 (2017).
Accurate, large minibatch SGD: training imagenet in 1 hour, by Goyal, Priya, et al. arXiv preprint arXiv:1706.02677 (2017).
Practical information
- General public
- Free
Contact
- EDIC - [email protected]