1 Comment

VMware has achieved a SPECweb2005 benchmark score of 50,166 using VMware vSphere 4, a 14% improvement over the world record results previously published on VI3. Our latest results further strengthen the position of VMware vSphere as an industry leader in web serving, thanks to a number of performance enhancements and features that are included in this release. In addition to the measured performance gains, some of these enhancements will help simplify administration in customer environments.

The key highlights of the current results include:

  1. Highly scalable virtual SMP performance.
  2. Over 25% performance improvement for the most I/O intensive SPECweb2005 support component.
  3. Highly simplified setup with no device interrupt pinning.

Let me briefly touch upon each of these highlights.

Virtual SMP performance

The improved scheduler in ESX 4.0 enables usage of large symmetric multiprocessor (SMP) virtual machines for web-centric workloads. Our previous world record results published on ESX 3.5 used as many as fifteen uniprocessor (UP) virtual machines. The current results with ESX 4.0 used just four SMP virtual machines. This is made possible by several improvements
that went into the CPU scheduler in ESX 4.0.

From a scheduler perspective, SMP virtual machines present additional considerations such as co-scheduling. This is because in case of a SMP virtual machine, it is important for ESX scheduler to
present the applications and the guest OS running in the virtual machine with
the illusion that they are running on a dedicated multiprocessor machine. ESX
implements this illusion by co-scheduling the virtual processors of a SMP virtual machine. While the requirement to co-schedule all the virtual processors of a VM was
relaxed in the previous releases of ESX, the relaxed co-scheduling algorithm
has been further refined in ESX 4.0. This means the scheduler has more choices in
its ability to schedule the virtual processors of a VM. This leads to higher
system utilization and better overall performance in a consolidated

ESX 4.0 has also improved its resource locking mechanism. The
locking mechanism in ESX 3.5 was based on the cell lock construct. A cell is a
logical grouping of physical CPUs in the system within which all the vCPUs of a
VM had to be scheduled. This has been replaced with per-pCPU and per-VM locks.
This fine-grained locking reduces contention and improves scalability. All
these enhancements enable ESX 4.0 to use SMP VMs and achieve this new level of SPECweb2005 performance.

Very high performance gains for workloads with large I/O component

I/O intensive applications highlight the performance enhancements of ESX 4.0. These tests show that high-I/O workloads yield the largest gains when upgrading to this release.

In all our tests, we used SPECweb2005 workload which measures the system’s ability to
act as a web server. It is designed with three workloads to characterize different web usage patterns: Banking (emulate online banking), E-commerce (emulates an E-commerce site) and Support (emulates a vendor support site that provides downloads). The performance score of each of the workloads is measured in terms of the number of simultaneous sessions the system is able to support while meeting the QoS requirements of the workload. The aggregate metric reported by the SPECweb2005 workload normalizes the performance scores obtained on the three workloads.

The following figure compares the scores of the
three workloads obtained on ESX 4.0 to the previous results on ESX 3.5. The
figure also highlights the percentage improvements obtained on ESX 4.0 over ESX
3.5. We used an HP ProLiant DL585 G5 server with four Quad-Core AMD Opteron processors
as the system under test. The benchmark results have been reviewed and approved
by the SPEC committee.


We used the same HP ProLiant
DL585 G5 server and the physical test infrastructure in the current as well as
the previous benchmark submission on VI3. There were some differences between
the two test configurations (for example, ESX 3.5 used UP VMs while SMP VMs were used
on ESX 4.0; ESX 4.0 tests were run on currently available processors that have
a slightly higher clock speed). To highlight the performance gains, we will look
at the percentage improvements obtained for all the three workloads rather than
the absolute numbers.

As you can see from the above figure, the biggest percentage gain was seen with the Support workload, which has the largest I/O component. In this test, a 25% gain was seen while ESX drove about 20 Gbps of web traffic. Of the three workloads, the Banking workload has the smallest I/O component, and accordingly had relatively smaller percentage gain.

Highly simplified setup

ESX 4.0 also simplifies customer environments without sacrificing performance. In our previous ESX 3.5 results, we pinned the device interrupts to make efficient use of hardware caches and improve performance. Binding device interrupts to specific processors is a technique common to SPECweb2005 benchmarking tests to maximize performance. Results published in the http://www.spec.or/osg/web2005 website reveal the complex pinning configurations used by the benchmark publishers in the native environment.

The highly improved I/O processing model in ESX 4.0 obviates the need to do any manual device interrupt pinning. On ESX, the I/O requests issued by the VM are intercepted by the virtual machine monitor (VMM) which handles them in cooperation with the VMkernel. The improved execution model in ESX 4.0 processes these I/O requests asynchronously which allows the vCPUs of the VM to execute other tasks.

Furthermore, the scheduler in ESX 4.0 schedules processing of network traffic based on processor cache architecture, which eliminates the need for manual device interrupt pinning. With the new core-offload I/O system and related scheduler improvements, the results with ESX 4.0 compare favorably to ESX 3.5.


These SPECweb2005 results demonstrate that customers can expect substantial performance gains on ESX 4.0 for web-centric workloads. Our past results published on ESX 3.5 showed world record performance in a scale-out (increasing the number of virtual machines) configuration and our current results on vSphere 4 demonstrate world class performance while scaling up (increasing the number of vCPUs in a virtual machine). With an improved scheduler that required no fine-tuning for these experiments, VMware vSphere 4 can offer these gains while lowering the cost of administration.