Modern data centers often employ non-uniform memory access (NUMA) machines with multiple NUMA nodes. Each node consists of some (hardware) threads and local memory. A thread can access memory in any node, but accessing local memory is faster than (remote) memory at another node. To obtain the best performance, concurrent data structures must take this fact into consideration: they must be “NUMA-aware”. Unfortunately, designing NUMA-aware concurrent data structures is hard. Designing concurrent data structures is already difficult, but designing them to be NUMA aware is even harder because the algorithm must ensure accesses are often to local memory. In this work, we show how to obtain NUMA-aware concurrent data structures automatically. Starting from a sequential implementation of the data structure, we propose an algorithm called Node Replication (NR) to transform the implementation into a NUMA-aware concurrent implementation. NR relies on three techniques: replication, an efficient log data structure, and flat combining. We describe this algorithm in section 2. In Appendix A, we discuss some subtle intricacies of NR essential for performance; we then show that, despite its optimizations, NR produces linearizable data structures. We give NR’s full pseudocode in Appendix B.




Related projects