VMware Research | Building Scalable and Flexible Cluster Managers Using Declarative Programming

Abstract

Cluster managers like Kubernetes and OpenStack are notoriously hard to develop, given that they routinely grapple with hard combinatorial optimization problems like load balancing, placement, scheduling, and configuration. Today, cluster manager developers tackle these problems by developing system-specific best effort heuristics, which achieve scalability by significantly sacrificing the cluster manager's decision quality, feature set, and extensibility over time. This is proving untenable, as solutions for cluster management problems are routinely developed from scratch in the industry to solve largely similar problems across different settings. We propose DCM, a radically different architecture where developers specify the cluster manager's behavior declaratively, using SQL queries over cluster state stored in a relational database. From the SQL specification, the DCM compiler synthesizes a program that, at runtime, can be invoked to compute policy-compliant cluster management decisions given the latest cluster state. Under the covers, the generated program efficiently encodes the cluster state as an optimization problem that can be solved using off-the-shelf solvers, freeing developers from having to design ad-hoc heuristics. We show that DCM significantly lowers the barrier to building scalable and extensible cluster managers. We validate our claim by powering three production-grade systems with it: a Kubernetes scheduler, a virtual machine management solution, and a distributed transactional datastore.

Files

osdi20-suresh.pdf

Date

November, 2020

Authors

Lalith Suresh
João Ferreira Loff
Faria Kalim
Sangeetha Abdu Jyothi
Nina Narodytska
Leonid Ryzhyk
Sahan Gamage
Brian Oki
Pranshu Jain
Michael Gasch

Related projects

Declarative Cluster Management

Research Areas

cluster management
declarative programming
distributed systems

Type

Inproceedings

Booktitle

14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)