Introduction
Designing rack-scale system software for data-intensive distributed workloads using disaggregated resources.
Summary
We are designing new software system infrastructure and frameworks for data-intensive distributed workloads, such as ML pipelines, running at the rack scale. These workloads incur frequent data exchange across the system, which impairs their performance. We are exploring ways to optimize these data exchanges and improve application performance using disaggregated resources such as memory pools, CPUs, and accelerators.
Details
Current and emerging distributed workloads are based on distributed data flows (e.g., ML pipelines including training, ETL pipelines, streaming pipelines). Data flow workloads require frequent exchange of data among tasks running across the system. This project investigates new ways to improve the efficiency of such workloads by optimizing these data exchanges using disaggregated resources such as memory pools, GPUs, and FPGA accelerators, and by using fabrics such as RDMA and CXL. Roughly speaking, these technologies permit tasks to share data while avoiding overheads from serialization, deserialization, data copying between user-space and kernel-space, and data copying across hosts. We wish to apply these technologies while avoiding their complexities by defining new memory-as-a-service abstractions that distributed data flows can use.
Researchers
2021 Interns
External Researchers