Cost-Efficient Distributed Learning

Thumbnail

Event details

Date 17.05.2023
Hour 14:0015:00
Speaker Dr Rawad Bitar TUM
Location
Category Conferences - Seminars
Event Language English

We analyze distributed gradient descent, which is an iterative algorithm that consists of evaluating the gradient of a loss function at a current model and the considered data. A main node sends its data to n worker devices. At each iteration, the workers evaluate the gradient at the data assigned to them and return the result to the main node. The main node aggregates the obtained evaluations and updates the model. This setting is known to suffer from the presence of stragglers, i.e., slow or unresponsive workers, whose delays may outweigh the benefit of distributing the computations.

Under some constraints on the response times of the workers, ignoring the stragglers is equivalent to running stochastic gradient descent (SGD). Hence, the main node only aggregates the result returned from the fastest k≤n workers, where k is fixed parameter. The choice of k that can be optimized a priori presents a trade-off between the total run-time of the algorithm and the quality of the obtained model.

In this talk, we design a strategy that adaptively increases k throughout the run-time of the algorithm to obtain the best run-time possible with the best quality of the model. To that end, we derive a relationship between the value of k, the time spent per iteration, and the quality of the obtained model. Based on this relationship, we derive the times at which the main node needs to increase k and hence reduce the communication cost.

We then introduce a more cost-efficient strategy that only assigns tasks to k workers per iteration. To learn which k workers are the fastest, we rely on the theory of multi-armed bandits. We design a cost-efficient scheme that assigns computational tasks to the workers while learning their speed. We show that for a fixed cost (measured by the number of worker employment), the introduce strategy outperforms the previous schemes. However, this strategy has a longer run-time due to the overhead of learning the speed of the workers.

We validate the results and concretize the theoretical findings through numerical simulations.

This talk summarizes results from joint works with Antonia Wachter-Zeh, Deniz Gündüz, Maximilian Egger, Parimal Parag, Salim El Rouyahbe, Serge Kas Hanna and Venkat Dasaari.

 

Practical information

  • Informed public
  • Free

Organizer

  • IPG Seminar Dr Bitar is hosted by Prof. Rüdiger Urbanke (LTHC)

Contact

  • Muriel Bardet LTHC  

Share