Last month we published a Tech Note summarizing networking throughput results using ESX Server 3.0.1 and XenEnterprise 3.2.0. Multiple NICs were used in order to achieve the maximum throughput possible in a single uniprocessor VM. While these results are very useful for evaluating the virtualization overhead of networking, a more common configuration is to spread the networking load across multiple VMs. We present results for multi-VM networking in a new paper just published. Only a single 1 Gbps NIC is used per VM, but with up to four VMs running simultaneously. This simulates a consolidation scenario of several machines each with substantial, but not extreme, networking I/O. Unlike the multi-NIC paper, there is no exact native analog, but we ran the same total load in a SMP native Windows machine for comparison. The results are similar to the earlier ones: ESX stays close to native performance, achieving up to 3400 Mbps for the 4-VM case. XenEnterprise peaks at 3 VMs and falls off to 62-69% of the ESX throughput with 4 VMs. According to the XenEnterprise documentation only three physical NICs are supported in the host, even though the UI let us configure and run four physical NICs without error or warning. This is not surprising given the performance. We then tried a couple of experiments (like making dom0 use more than 1 CPU) to fix the bottleneck, but only succeeded in further reducing the throughput. The virtualization layer in ESX is always SMP, and together with a battle-tested scheduler and support for 32 e1000 NICs, scales to many heavily-loaded VMs. Let us know if you’re able to reach the limits of ESX networking!
Networking performance is a critical part of the usability of any virtualized system. As such, several benchmarks have been published showing that a virtual machine can achieve near-native throughput. While this is a good result, it doesn’t tell the whole story. A shortcoming with most of these studies is that they use just a single 1 Gbps network interface card (NIC), which simply does not stress a modern system, virtualized or not. In an earlier paper, we addressed this by dedicating two NICs to a single VM and running an instance of netperf on each. This still was not a challenge. Now, in a Tech Note just published we used 3 NICs attached to a single UP Windows VM under ESX 3.0.1 and measured total throughput. This gives close to the maximum throughput possible in a fully-virtualized system. We compared this against native and the comparable configuration under XenEnterprise 3.2.0, the most recent version of XenEnterprise available. While both ESX 3.0.1 and XenEnterprise 3.2.0 yield near-native performance using one NIC, ESX gives considerably better performance with 2 or 3 NICs, demonstrating ESX’s lower overhead and better scalability.
At VMworld last November I had the opportunity to talk to many ESX users and to discover for myself what performance issues were most on their minds. As it turned out, this endeavor was not very successful; everybody was generally happy with ESX performance. On the other hand the performance and best practice talks were among the most popular, indicating that users were very interested in learning new ways of getting the most out of ESX. VMworld was just the wrong audience to reach people who had concerns about performance. I was preaching to the choir, instead of the non-virtualized souls out there. At the same time aggresive marketing by other virtualization companies creates confusion about ESX performance. So it was decided that we needed to make a better effort at clearing misconceptions and providing real performance data, especially to enterprises just starting to consider their virtualization options.
A Performance Comparison of Hypervisors is the first fruit of this effort. In this paper we consider a variety of simple benchmarks running in a Windows guest on both ESX 3.0.1 and the open-source version of Xen 3.0.3. We chose Windows guests for this first paper since it’s the most widely used OS on x86 systems. We used open-source Xen 3.0.3 for these tests since it was the only Xen variant that supported Windows guests at the time we ran the tests. Everything was run on an IBM X3500 with two dual-core Intel Woodcrest processors. Xen used the hardware-assist capabilities of this processor (Intel-VT) in order to run an unmodified guest while ESX used VMware’s very mature Binary Translation technology. The results might not be what you expect from reading marketing material! Even for CPU and memory benchmarks dominated by direct execution, Xen shows significantly more overhead than ESX. The difference is bigger for a compilation workload, and huge for networking. The latter is due mostly to a lack of open-source paravirtualized (PV) device drivers for Windows. PV drivers are available in some commercial products based on Xen and should give much better performance. Xen was not able to run SPECjbb2005 at all since SMP Windows guests were not supported at the time the tests were done. This support was added very recently to Xen 3.0.4, however the commercial products are still on Xen 3.0.3. ESX has had PV network drivers (vmxnet) and been able to run SMP Windows guests for years.
We are currently exploring the many dimensions of the performance matrix; 64 bit, Linux guests, AMD processors, more complex benchmarks, etc. Results will be posted to VMTN as they are obtained. Readers are encouraged to perform their own tests and measure the performance for themselves.
Please give us your feedback on this paper and the usefulness to you of this kind of work in general. And if ESX fans find this paper informative, so much the better!