Abstract

Buffer-and-flush is a technique for transforming standard external-memory search trees into write-optimized search trees. In exchange for faster amortized insertions, buffer-and-flush can sometimes significantly increase the latency of operations by causing cascades of flushes. In this paper, we show that flushing cascades are not a fundamental consequence of the buffer-flushing technique, and can be removed entirely using randomization techniques.

The underlying implementation of buffer flushing relies on a buffer-ejection strategy at each node in the tree. The ability for the user to select the buffer ejection strategy based on the workload has been shown to be important for performance, both in theory and in practice.

In order to support arbitrary buffer-ejection strategies, we introduce the notion of a universal flush, which uses a universal ejection policy that can simulate any other ejection policy. This abstracts away the underlying ejection strategy, even allowing for workload-specific strategies that change dynamically.

Our deamortization preserves the amortized throughput of the underlying flushing strategy on all workloads. In particular, with our deamortization and a node cache of size poly-logarithmic in the number of insertions performed on the tree, the amortized insertion cost matches the lower bound of Brodal and Fagerberg. For typical parameters, the lower bound is less than 1 I/O per insertion. For such parameters, our worst-case insertion cost is O(1) I/Os.

Date

January, 2020

Authors

  • Michael A. Bender
  • Rathish Das
  • Martin Farach-Colton
  • Rob Johnson
  • William Kuszmaul

Related projects

Research Areas

  • Data Structures
  • Databases
  • External-Memory Data Structures
  • Write-Optimized Data Structures

Type

Inproceedings

Booktitle

SODA