...for their work on “IOctopus: Outsmarting Nonuniform DMA.”

ASPLOS is the premier forum for interdisciplinary systems research, intersecting computer architecture, hardware and emerging technologies, programming languages and compilers, operating systems, and networking

Authors: Igor Smolyar, Alex Markuze (Technion), Boris Pismenny (Technion, Mellanox), Haggai Eran (Technion, Mellanox), Gerd Zellweger (VMware Research), Austin Bolen (Dell), Liran Liss (Mellanox), Adam Morrison (Tel Aviv University), and Dan Tsafrir (Technion, VMware Research).

Abstract: In a multi-CPU server, memory modules are local to the CPUto which they are connected, forming a nonuniform memory access (NUMA) architecture. Because non-local accesses are slower than local accesses, the NUMA architecture might degrade application performance. Similar slowdowns occur when an I/O device issues nonuniform DMA (NUDMA) operations, as the device is connected to memory via a single CPU. NUDMA effects therefore degrade application performance similarly to NUMA effects. We observe that the similarity is not inherent but rather a product of disregarding the intrinsic differences betweenI/O and CPU memory accesses. Whereas NUMA effects are inevitable, we show that NUDMA effects can and should be eliminated. We present IOctopus, a device architecture that makes NUDMA impossible by unifying multiple physical PCIe functions—one per CPU—in manner that makes them appear as one, both to the system software and externally to the server. IOctopus requires only a modest change to the device driver and firmware. We implement it on existing hardware and demonstrate that it improves throughput and latency by as much as 2.7x and 1.28x, respectively, while ridding developers from the need to combat (what appeared to be) an unavoidable type of overhead.