Machine Learning Training and Deployment in Disaggregated Architectures

Thumbnail

Event details

Date 28.02.2023
Hour 14:0016:00
Speaker Diana Andreea Petrescu
Category Conferences - Seminars
EDIC candidacy exam
Exam president: Prof. Jean-Yves Le Boudec
Thesis advisor: Prof. Rachid Guerraoui
Thesis co-advisor: Prof. Anne-Marie Kermarrec
Co-examiner: Prof. Boi Faltings

Abstract
Cloud computing has taken an important role in reducing infrastructure costs by replacing on-premise data centers. The way cloud providers are able to achieve this cost reduction relies on an economy of scale which is attainable with multi-tenancy and proper resource utilization. One way of achieving the latter is through disaggregation, which consists in separating servers into their constituent resources (computing, memory and storage) and interconnecting them via network. This way, each resource can be allotted as required and independently scaled, suiting the needs of distinct workloads that make disproportional usage of such resources. This potentially prevents overwhelming machines with under-provisioned resources or wasting these resources on over-provisioned ones, apart from reducing the total cost of ownership (TCO) of providers. As a consequence, however, extra pressure is put on the network layer, since it interconnects all disaggregated resources.
 
In order to leverage the improved resource utilization brought by disaggregation while preserving or improving application performance in comparison to monolithic servers, one has to minimize data movement. Typically, this is achieved by improving data locality. In other words, by keeping compute units near the data they process. Noticeably, the most common approaches for enhancing data locality consist of manipulating either data (e.g., prefetching and caching) or code (near-data processing (NDP) or pushdown). This however calls both for some compute capability in the storage tier (e.g., GPU along with an array of disks) and some memory capability in the compute tier (e.g., disks along with an array of GPUs). Clearly, applications running on disaggregated cloud providers can greatly benefit from these locality-enforcing techniques.
 
Machine learning (ML) processing is a natural fit for cloud deployment. The reason is that it requires large amounts of both data (hence storage) and computing power. In case of disaggregation, i.e., when storage is decoupled from computing tiers, one has to decide what pieces of computation should run where. To tackle this problem, we have to consider that the internal storage bandwidth (i.e., between durable storage and CPU) is way larger than the network bandwidth that connects storage and compute tiers. A naive solution would therefore pushdown all computations to the cloud object storage (COS) and send only the result back to the requesting client. The problem is that this deters the very benefits of disaggregating servers in the first place. Such an approach would quickly saturate the computing resources of the storage tier, which are not optimized for large processing jobs, and hence hamper both the application processing time and the experience of other COS users in a multi-tenant environment. At the other end, i.e., the computing tier, one could think of prefetching data that is about to be used and caching it in case it is likely to be reused in the near future. Again, the converse problem arises: as computing machines are not optimized for storing large amounts of data, one would rapidly exhaust their storage capacity.
 
Thus, there is a need for smart solutions that decide how to split an ML computation between the COS and the compute tier, taking into account the generality of ML tasks and the concurrency and privacy aspects of the system.

Background papers

Practical information

  • General public
  • Free

Tags

EDIC candidacy exam

Share