Introduction

Designing rack-scale system software for data-intensive distributed workloads using disaggregated resources.

Summary

We are designing new software system infrastructure and frameworks for data-intensive distributed workloads, such as ML pipelines, running at the rack scale. These workloads incur frequent data exchange across the system, which impairs their performance. We are exploring ways to optimize these data exchanges and improve application performance using disaggregated resources such as memory pools, CPUs, and accelerators.

Details

Current and emerging distributed workloads are based on distributed data flows (e.g., ML pipelines including training, ETL pipelines, streaming pipelines). Data flow workloads require frequent exchange of data among tasks running across the system. This project investigates new ways to improve the efficiency of such workloads by optimizing these data exchanges using disaggregated resources such as memory pools, GPUs, and FPGA accelerators, and by using fabrics such as RDMA and CXL. Roughly speaking, these technologies permit tasks to share data while avoiding overheads from serialization, deserialization, data copying between user-space and kernel-space, and data copying across hosts. We wish to apply these technologies while avoiding their complexities by defining new memory-as-a-service abstractions that distributed data flows can use.

Researchers

2021 Interns

External Researchers

  • Nadav Amit
  • Ryan Stutsman