A theoretical analysis of insertion buffers, which are widely used to accelerate insertions into B-trees and other database indexes.


Balls-and-bins games have been a wildly successful tool for modeling load balancing problems. In this paper, we study a new scenario, which we call the ball recycling game, defined as follows:

Throw m balls into n bins i.i.d. according to a given probability distribution p. Then, at each time step, pick a non-empty bin and recycle its balls: take the balls from the selected bin and re-throw them according to p.

This balls-and-bins game closely models memory-access heuristics in databases. The goal is to have a bin-picking method that maximizes the recycling rate, defined to be the expected number of balls recycled per step in the stationary distribution. We study two natural strategies for ball recycling: FB, which greedily picks the bin with the maximum number of balls, and RB, which picks a ball at random and recycles its bin. We show that for general p, RB is O(1)-optimal, whereas FB can be pessimal. However, when p=u, the uniform distribution, FB is optimal to within an additive constant.


January, 2019


Related projects


  • Databases
  • External-Memory Algorithms