Efficient gradient coding for mitigating stragglers within distributed machine learning

Event details

Date	24.09.2025
Hour	16:15 › 17:15
Speaker	Prof. Aditya Ramamoorthy - Iowa State University
Location	BC 02
Category	Conferences - Seminars
Event Language	English

Large scale distributed learning is the workhorse of modern-day machine learning algorithms. A typical scenario consists of minimizing a loss function (depending on the dataset) with respect to high-dimensional parameter. Workers typically compute gradients on their assigned dataset chunks and send them to the parameter server (PS), which aggregates them to compute either an exact or approximate version of the overall gradient of the relevant loss function. However, in large-scale clusters, many workers are prone to straggling (are slower than their promised speed or even failure-prone). A gradient coding solution introduces redundancy within the assignment of chunks to the workers and uses coding theoretic ideas to allow the PS to recover the overall gradient (exactly or approximately), even in the presence of stragglers. Unfortunately, most existing gradient coding protocols are inefficient from a computation perspective as they coarsely classify workers as operational or failed; the potentially valuable work performed by slow workers (partial stragglers) is ignored.

Practical information

Informed public
Free

Organizer

IPG Seminar (Michael Gastpar)

Contact

[email protected]

Export Event

Event broadcasted in

Send a reminder