Measuring the Cost of SMP with Mixed Workloads
It is no secret that vSphere 4.0 delivers excellent performance and provides the capability to virtualize the beefiest of workloads. Several impressive performance studies using ESX 4.0 have been already been presented. (My favorite is this database performance whitepaper.) However, I continue to hear questions about the scheduling overhead of larger VMs within a heavily-utilized, mixed-workload environment. We put together a study using simple variations of VMware’s mixed-workload consolidation benchmark VMmark to help answer this question.
For this study we chose two of the VMmark workloads, database and web server, as the vCPU-scalability targets. These VMs represent workloads that typically show the greatest range of load in production environments so they are natural choices for a scalability assessment. We varied the number of vCPUs in these two VMs between one and four and measured throughput scaling and CPU utilization of each configuration by increasing the number of benchmark tiles up to and beyond system saturation.
The standard VMmark workload levels were used and were held constant for all tests. Given that the workload is constant, we are measuring the cost of SMP VMs and their impact on the scheduler . This approach places increasing stress the hypervisor as the vCPU allocations increase and creates a worst-case scenario for the scheduler. The vCPU allocations for the three configurations are shown in the table below:
|
|
Webserver vCPUs |
Database vCPUs |
Fileserver vCPUs |
Mailserver vCPUs |
Javaserver vCPUs |
Standby vCPUs |
Total vCPUs |
|
Config1 |
1 |
1 |
1 |
2 |
2 |
1 |
8 |
|
Config2 |
2 |
2 |
1 |
2 |
2 |
1 |
10 |
|
Config3 |
4 |
4 |
1 |
2 |
2 |
1 |
14 |
Config2 uses the standard VMmark vCPU allocation of 10 vCPUs per tile. Config1 contains 20% fewer vCPUs than the standard while Config3 contains 40% more than the standard.
We also used Windows Server 2008 instead of Windows Server 2003 where possible to characterize its behavior in anticipation of using Server 2008 in a next-generation benchmark. As a result, we increased the memory in the Javaserver VMs from 1GB to 1.4 GB to insure sufficient memory space for the JVM. The table below provides a summary of each VM’s configuration:
|
Workload |
Memory |
Disk |
OS |
|
Mailserver |
1GB |
24GB |
Windows
2003 32bit |
|
Javaserver |
1.4GB |
12GB
(*) |
Windows
2008 64bit |
|
Standby
Server |
256MB
(*) |
12GB
(*) |
Windows
2008 32bit |
|
Webserver |
512MB |
8GB |
SLES
10 SP2 64bit |
|
Database |
2GB |
10GB |
SLES
10 SP2 64bit |
|
Fileserver |
256MB |
8GB |
SLES
10 SP2 32bit |
Below is a basic summary of the hardware used:
- Dell PowerEdge R905 with 4 x 2.6GHz Quad Core AMD Opteron 8382
- Firmware version 3.0.2 (latest available).
- 128GB DDR2 Memory.
- 2 x Intel E1000 dual-port NIC
- 2 x Qlogic 2462 dual-port 4Gb
- 2 x EMC CX3-80 Storage Arrays.
- 15 x HP DL360 client systems.
Experimental Results
Figure 1 below shows both the CPU utilization and the throughput scaling normalized to the single-tile throughput of Config1. Both throughput and CPU utilization remain roughly equal for all three configurations at load levels of 1, 3, and 6 tiles (6, 18, and 36 VMs, respectively). The cost of using SMP VMs is negligible here. The throughputs remain roughly equal while the CPU utilization curves begin to diverge as the load increases to 9, 10, and 11 tiles (54, 60, and 66 VMs, respectively). Furthermore, all three configurations achieve roughly linear scaling up to 11 tiles (66 VMs). CPU utilization when running 11 tiles was 85%, 90%, and 93% for Config1, Config2, and Config3, respectively. Considering that few customers are comfortable running at overall system utilizations above 85%, this result shows remarkable scheduler performance and limited SMP co-scheduling overhead within a typical operating regime.
Figure 2 below shows the same normalized throughput of Figure 1 as well as the total number of running vCPUs to illustrate the additional stresses put on the hypervisor by the progressively larger SMP configurations. For instance, the throughput scaling at nine tiles is equivalent despite the fact that Config1 requires only 72 vCPUs while Config3 uses 126 vCPUs. As expected, Config3, with its heavier resource demands, is the first to transition into system saturation. This occurs at a load of 12 tiles (72 VMs). At 12 tiles, there are 168 vCPUs active – 48 more vCPUs than used by Config2 at 12 tiles. Nevertheless, Config3 scaling only lags Config2 by 9% and Config1 by 8%. Config2 reaches system saturation at 14 tiles (84 VMs), where it lags Config1 by 5%. Finally Config1 hits the saturation point at 15 tiles (90 VMs).
Overall, these results show that ESX 4.0 effectively and fairly manages VMs of all shapes and sizes in a mixed-workload environment. ESX 4.0 also exhibits excellent throughput parity and minimal CPU differences between the three configurations throughout the typical operating envelope. ESX continues to demonstrate first-class enterprise stability, robustness, and predictability in all cases. Considering how well ESX 4.0 handles a tough situation like this, users can have confidence when virtualizing their larger workloads within larger VMs.
(*) The spartan memory and disk allocations for the Windows Server 2008 VMs might cause readers to question if the virtual machines were adequately provisioned. Since our internal testing covers a wide array of virtualization platforms, reducing the memory of the Standby Server enables us to measure the peak performance of the server before encountering memory bottlenecks on virtualization platforms where physical memory is limited and sophisticated memory overcommit techniques are unavailable. Likewise, we want to configure our tests so that the storage capacity doesn’t induce an artificial bottleneck. Neither the Standby Server nor the Javaserver place significant demands on their virtual disks, allowing us to optimize storage usage. We carefully compared this spartan Windows Server 2008 configuration against a richly configured Windows Server 2008 tile and found no measurable difference in stability or performance. Of course, I would not encourage this type of configuration in a live production setting. On the other hand, if a VM gets configured in this way, vSphere users can sleep well knowing that ESX won’t let them down.
Comments