Bruce Herndon of VMware’s performance team uses the VMmark benchmark to take a look at how ESX Server utilizes NUMA (non-uniform memory access) while scaling to high numbers of virtual machines. Link: Studying NUMA with VMmark.
In a NUMA system, the processors are divided into
sets, also known as nodes. Each node has direct, local access to a
portion of the overall system memory and must communicate via a network
interconnect to access the remaining, remote memory at other NUMA
nodes. The memory interconnect adds latency to remote memory accesses,
making them slower than local ones. Applications that heavily utilize
the faster local memory tend to perform better than those that don’t.
VMware ESX Server is fully NUMA-aware and exploits local memory to
improve performance. …All in all, I am quite pleased with the results. They tell us that we need not worry about overstressing NUMA systems even as vendors make quad-core processors ubiquitous. In fact, I would say that virtual environments are a great match for commodity NUMA-based multi-core systems due to the encapsulation of memory requests within a virtual machine, which creates a largely local access pattern and limits stress on the memory subsystem. Of equal importance, these results show that the ESX scheduler exploits these types of systems well, which is good to see given how much work I know our kernel team has put into it. This type of exercise is just another area where a robust, stable, and representative virtualization benchmark like VMmark can prove invaluable.