Abstract
Modern data centers often employ non-uniform memory access (NUMA) machines with multiple
NUMA nodes. Each node consists of some (hardware) threads and local memory. A thread can access
memory in any node, but accessing local memory is faster than (remote) memory at another node. To
obtain the best performance, concurrent data structures must take this fact into consideration: they
must be “NUMA-aware”. Unfortunately, designing NUMA-aware concurrent data structures is hard.
Designing concurrent data structures is already difficult, but designing them to be NUMA aware is
even harder because the algorithm must ensure accesses are often to local memory.
In this work, we show how to obtain NUMA-aware concurrent data structures automatically.
Starting from a sequential implementation of the data structure, we propose an algorithm called
Node Replication (NR) to transform the implementation into a NUMA-aware concurrent implementation.
NR relies on three techniques: replication, an efficient log data structure, and flat combining.
We describe this algorithm in section 2. In Appendix A, we discuss some subtle intricacies of NR
essential for performance; we then show that, despite its optimizations, NR produces linearizable
data structures. We give NR’s full pseudocode in Appendix B.