Towards Communication-Efficient Distributed Machine Learning Techniques

Thumbnail

Event details

Date 25.06.2018
Hour 09:0011:00
Speaker Arsany Guirguis
Location
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Patrick Thiran
Thesis advisor: Prof. Rachid Guerraoui
Co-examiner: Prof. Martin Jaggi

Abstract
Machine Learning (ML) has proven to be powerful in deriving useful information benefiting from the increasing amount of data available daily on the Internet. To make the best of this massive amount of data, ML models are becoming larger and more complex. Yet, training such complex models with large datasets is beyond the capabilities of a single machine. Hence, training ML models is becoming distributed. Although distributing the learning task improves scalability, this comes with communication challenges. Existing work has already attempted to address these challenges in some specific cases, but there is still a room for advancing the state-of-the-art solutions.
In this proposal, I am going to present Tensorflow, a popular system for large-scale distributed machine learning, and a couple of ideas that were proposed with the goal of enhancing the communication layer performance. In my research, I am interested in looking at communication challenges in different distributed ML environments.

Background papers
TensorFlow: A System for Large-Scale Machine Learning​, by Abadi, M. et al.
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters, by Zhang, H., et al-
Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds, by Hsieh, K. et al.
 
 

Practical information

  • General public
  • Free

Contact

Tags

EDIC candidacy exam

Share