Introduction
REEF is a development
framework that provides a control-plane for scheduling and
coordinating task-level (data-plane) work on clusters.
Abstract
Resource Managers like Apache YARN have emerged as a critical layer
in the cloud computing system stack, but the developer abstractions
for leasing cluster resources and instantiating application logic
are very low-level. This flexibility comes at a high cost in terms
of developer effort, as each application must repeatedly tackle the
same challenges (e.g., fault-tolerance, task scheduling and
coordination) and re-implement common mechanisms (e.g., caching,
bulk-data transfers). This paper presents REEF, a development
framework that provides a control-plane for scheduling and
coordinating task-level (data-plane) work on cluster resources
obtained from a Resource Manager. REEF provides mechanisms that
facilitate resource re-use for data caching, and state management
abstractions that greatly ease the development of elastic data
processing work-flows on cloud platforms that support a Resource
Manager service. REEF is being used to develop several commercial
offerings such as the Azure Stream Analytics service. REEF is also
currently an Apache Incubator project that has attracted
contributors from several
instititutions.