Designing for the rack scale to get simplicity, better performance, and lower cost
We are looking at new distributed system designs for one or more rack of computers, utilizing the latest and future hardware technologies, such as accelerators and disaggregated memory.
Distributed systems are extremely complex because of many reasons (concurrency, failures, heterogeneity, variable scale, etc). In this project, we investigate distributed systems in a narrower domain—one or a few racks—which brings many benefits. First, racks eliminate some of these sources of complexity, namely heterogeneity and variable scale, because racks are homogenous with a small number of different node types (compute, memory, cache, and storage ratios) and modest scale (a few dozen nodes). Second, racks can adopt the latest hardware technologies (accelerators, disaggregated memory), each bringing distinct features.