Machine Learning Training and Deployment in Disaggregated Architectures

Event details
Date | 28.02.2023 |
Hour | 14:00 › 16:00 |
Speaker | Diana Andreea Petrescu |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof. Jean-Yves Le Boudec
Thesis advisor: Prof. Rachid Guerraoui
Thesis co-advisor: Prof. Anne-Marie Kermarrec
Co-examiner: Prof. Boi Faltings
Abstract
Cloud computing has taken an important role in reducing infrastructure costs by replacing on-premise data centers. The way cloud providers are able to achieve this cost reduction relies on an economy of scale which is attainable with multi-tenancy and proper resource utilization. One way of achieving the latter is through disaggregation, which consists in separating servers into their constituent resources (computing, memory and storage) and interconnecting them via network. This way, each resource can be allotted as required and independently scaled, suiting the needs of distinct workloads that make disproportional usage of such resources. This potentially prevents overwhelming machines with under-provisioned resources or wasting these resources on over-provisioned ones, apart from reducing the total cost of ownership (TCO) of providers. As a consequence, however, extra pressure is put on the network layer, since it interconnects all disaggregated resources.
In order to leverage the improved resource utilization brought by disaggregation while preserving or improving application performance in comparison to monolithic servers, one has to minimize data movement. Typically, this is achieved by improving data locality. In other words, by keeping compute units near the data they process. Noticeably, the most common approaches for enhancing data locality consist of manipulating either data (e.g., prefetching and caching) or code (near-data processing (NDP) or pushdown). This however calls both for some compute capability in the storage tier (e.g., GPU along with an array of disks) and some memory capability in the compute tier (e.g., disks along with an array of GPUs). Clearly, applications running on disaggregated cloud providers can greatly benefit from these locality-enforcing techniques.
Machine learning (ML) processing is a natural fit for cloud deployment. The reason is that it requires large amounts of both data (hence storage) and computing power. In case of disaggregation, i.e., when storage is decoupled from computing tiers, one has to decide what pieces of computation should run where. To tackle this problem, we have to consider that the internal storage bandwidth (i.e., between durable storage and CPU) is way larger than the network bandwidth that connects storage and compute tiers. A naive solution would therefore pushdown all computations to the cloud object storage (COS) and send only the result back to the requesting client. The problem is that this deters the very benefits of disaggregating servers in the first place. Such an approach would quickly saturate the computing resources of the storage tier, which are not optimized for large processing jobs, and hence hamper both the application processing time and the experience of other COS users in a multi-tenant environment. At the other end, i.e., the computing tier, one could think of prefetching data that is about to be used and caching it in case it is likely to be reused in the near future. Again, the converse problem arises: as computing machines are not optimized for storing large amounts of data, one would rapidly exhaust their storage capacity.
Thus, there is a need for smart solutions that decide how to split an ML computation between the COS and the compute tier, taking into account the generality of ML tasks and the concurrency and privacy aspects of the system.
Background papers
Exam president: Prof. Jean-Yves Le Boudec
Thesis advisor: Prof. Rachid Guerraoui
Thesis co-advisor: Prof. Anne-Marie Kermarrec
Co-examiner: Prof. Boi Faltings
Abstract
Cloud computing has taken an important role in reducing infrastructure costs by replacing on-premise data centers. The way cloud providers are able to achieve this cost reduction relies on an economy of scale which is attainable with multi-tenancy and proper resource utilization. One way of achieving the latter is through disaggregation, which consists in separating servers into their constituent resources (computing, memory and storage) and interconnecting them via network. This way, each resource can be allotted as required and independently scaled, suiting the needs of distinct workloads that make disproportional usage of such resources. This potentially prevents overwhelming machines with under-provisioned resources or wasting these resources on over-provisioned ones, apart from reducing the total cost of ownership (TCO) of providers. As a consequence, however, extra pressure is put on the network layer, since it interconnects all disaggregated resources.
In order to leverage the improved resource utilization brought by disaggregation while preserving or improving application performance in comparison to monolithic servers, one has to minimize data movement. Typically, this is achieved by improving data locality. In other words, by keeping compute units near the data they process. Noticeably, the most common approaches for enhancing data locality consist of manipulating either data (e.g., prefetching and caching) or code (near-data processing (NDP) or pushdown). This however calls both for some compute capability in the storage tier (e.g., GPU along with an array of disks) and some memory capability in the compute tier (e.g., disks along with an array of GPUs). Clearly, applications running on disaggregated cloud providers can greatly benefit from these locality-enforcing techniques.
Machine learning (ML) processing is a natural fit for cloud deployment. The reason is that it requires large amounts of both data (hence storage) and computing power. In case of disaggregation, i.e., when storage is decoupled from computing tiers, one has to decide what pieces of computation should run where. To tackle this problem, we have to consider that the internal storage bandwidth (i.e., between durable storage and CPU) is way larger than the network bandwidth that connects storage and compute tiers. A naive solution would therefore pushdown all computations to the cloud object storage (COS) and send only the result back to the requesting client. The problem is that this deters the very benefits of disaggregating servers in the first place. Such an approach would quickly saturate the computing resources of the storage tier, which are not optimized for large processing jobs, and hence hamper both the application processing time and the experience of other COS users in a multi-tenant environment. At the other end, i.e., the computing tier, one could think of prefetching data that is about to be used and caching it in case it is likely to be reused in the near future. Again, the converse problem arises: as computing machines are not optimized for storing large amounts of data, one would rapidly exhaust their storage capacity.
Thus, there is a need for smart solutions that decide how to split an ML computation between the COS and the compute tier, taking into account the generality of ML tasks and the concurrency and privacy aspects of the system.
Background papers
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News, 45(1):615–629, 2017.
- https://www.cl.cam.ac.uk/~ey204/teaching/ACS/R244_2019_2020/papers/kang_asplos_2017.pdf
- Yifei Yang, Matt Youill, Matthew Woicik, Yizhou Liu, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker. Flexpushdowndb: hybrid pushdown and caching in a cloud dbms. Proceedings of the VLDB Endowment, 2021.
- https://ashraf.aboulnaga.me/pubs/pvldb21flexpushdowndb.pdf
- Changho Hwang, Taehyun Kim, Sunghyun Kim, Jinwoo Shin, and KyoungSoo Park. Elastic resource sharing for distributed deep learning. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), pages 721–739. USENIX Association, April 2021. https://www.usenix.org/system/files/nsdi21-hwang.pdf
Practical information
- General public
- Free