VMware Research | Online List Labeling: Breaking the log^2 n Barrier

Abstract

The paper studies the online list-label problem where we want to store n items in sorted order in an array with m slots while minimizing the total number of movements per insertion/deletion. There has been extensive literature on the problem in different regimes of the comparison between n and m. The paper focuses on the regime where m is linear in n. In this regime, previous algorithms achieve O((log n)^2) movements per insertion/deletion and this is known to be tight for deterministic algorithms. The main result of the paper is a new randomized algorithm with O((log n)^1.5) cost per operation. The new algorithm is also history independent: the distribution of the state only depends on the set of items, not the history of insertions and deletions. The paper also gives a lower bound showing that for history independent algorithms and m<(1+epsilon)*n with a sufficiently small constant epsilon, the cost of (log n)^1.5 is required.<\p>

The previous deterministic algorithm uses weight balanced binary tree where the two sides are within 1+/-1/logn of each other. The paper first constructs a randomized algorithm that is fast for a random insertion/deletion also using a binary tree but with a relaxed balancing restriction. The balancing factor is random and while it could be as strict as the deterministic algorithm, on average it is only 1+/-1/sqrt(logn), which lead to faster running time. The algorithm is constructed to be history independent: the splitting point at each node of the tree only depends on the set of items, not the history. Thanks to being history independent, an algorithm with fast worst case expected time is obtained from the algorithm with fast average expected time by padding the input set with a random number of dummy elements of value -infinity.<\p>

The paper solves an interesting problem theoretically. I am not sure about the scope of applications as one can use binary search trees instead in most cases and get superior performance. However, there are applications in the IO model that currently requires this data structure so I think the problem is justified.<\p>

Date

November, 2022

Authors

Alex Conway
Martin Farach-Colton
William Kuszmaul
Hanna Komlos
Michael A Bender
Nicole Wein

Research Areas

Data Structures
Lower Bounds

Type

Proceedings

Journal

FOCS