Two Host Matched-Pair Scaling Utilizing VMmark 2

As mentioned in Bruce’s previous blog, VMmark 2.0 has been released. With its release we can now begin to benchmark an enterprise-class cloud platform in entirely new and interesting ways. VMmark 2 is based on a multi-host configuration that includes bursty application and infrastructure workloads to drive load against a cluster. VMmark 2 allows for the analysis of infrastructure operations within a controlled benchmark environment for the first time, distinguishing it from server consolidation benchmarks.

Leading off a series of new articles introducing VMmark 2, the goal of this article was to provide a bit more detail about VMmark 2 and to test a vSphere enterprise cloud, focusing on the scaling performance of a matched pair of systems. More simply put, this blog looks to see what happens to cluster performance as more load is added to a pair of identical servers. This is important because it allows a means for identifying the efficiency of a vSphere cluster as demand increases.

VMmark2 Overview

VMmark 2 is a next-generation, multi-host virtualization benchmark that not only models application performance but also the effects of common infrastructure operations. It models application workloads in the now familiar VMmark 1 tile-based approach, where the benchmarker adds tiles until either a goal is met or the cluster is at saturation. It’s important to note that while adding tiles does effectively linearly increase the application workload requests being made, the load caused by infrastructure operations does not scale in the same way. VMmark 2 infrastructure operations scale as the cluster size grows to better reflect modern datacenters. Greater detail on workload scaling can be found within the benchmarking guide available for download. To calculate the score for VMmark 2, final results are generated from a weighted average of the two kinds of workloads; hence scores will not linearly increase as tiles are added. In addition to the throughput metrics, quality-of-service (QoS) metrics are also measured and minimum standards must be maintained for a result to be considered fully compliant.

VMmark 2 contains the combination of the application workloads and infrastructure operations running simultaneously. This allows for the benchmark to include both of these critical aspects in the results that it reports. The application workloads that make up a VMmark 2 tile were chosen to better reflect applications in today’s datacenters by employing more modern and diverse technologies. In addition to the application workloads, VMmark 2 makes infrastructure operation requests of the cluster. These operations stress the cluster with the use of vMotion, storage vMotion and Deploy operations. It’s important to note that while the VMmark 2 harness is stressing the cluster through the infrastructure operations, VMware’s Distributed Resource Scheduler (DRS) is dynamically managing the cluster in order to distribute and balance the computing resources available. The diagrams below summarize the key aspects of the application and infrastructure workloads.

VMmark 2 Workloads Details:

Application Workloads – Each “Tile” consists of the following workloads and VMs.

• DVD Store 2.1 - multi-tier OLTP workload consisting of a database VM and three webserver VMs driving a bursty load profile

• Exchange 2007

• Standby Server (heart beat server)

• OLIO - multi-tier social networking workload consisting of a web server and a database server.

Infrastructure Workloads – Consists of the following

• User-initiated vMotion.

• Storage vMotion.

• Deploy : VM cloning, OS customization, and Updating.

• DRS-initiated vMotion to accommodate host-level load variations

Environment Configuration:

Systems Under Test : 2 HP ProLiant DL380 G6
CPUs : 2 Quad-Core Intel® Xeon® CPU 5570 @ 2.93 GHz with HyperThreading Enabled
Memory : 96GB DDR2 Reg ECC
Storage Array : EMC CX380
Hypervisor : VMware ESX 4.1
Virtualization Management : VMware vCenter Server 4.1.0

Testing Methodology:

To test scalability as the number of VMmark 2 tiles increases, two HP ProLiant DL380 servers were configured identically and connected to an EMC Clarion CX-380 storage array. The minimum configuration for VMmark 2 is a two-host cluster running 1 tile, as such this was our baseline and all VMmark 2 scores were normalized to this result. A series of tests were then conducted on this two-host configuration increasing the number of tiles being run until the cluster approached saturation, recording both the VMmark 2 score and the average cluster CPU utilization during the run phase.

Results:

In circumstances where demand on a cluster increases, it becomes critical to understand how the environment adapts to these demands in order to plan for future needs. In many cases it can be especially important for businesses to understand how the application and infrastructure workloads were individually impacted. By breaking out the distinct VMmark 2 sub-metrics we can get a fine grained view of how the vSphere cluster responded as the number of tiles, and thus work performed, increased.

From the graph above we see the VMmark 2 scores show significant gains until reaching the point where the two-host cluster was saturated at 5 Tiles. Delving into this further, we see that as expected, the infrastructure operations remained nearly constant due to the requested infrastructure load not changing during the experimentation. Continued examination shows that the cluster was able to achieve nearly linear scaling for the application workloads through 4 Tiles. This is equivalent to 4 times the application work requested of the 1 Tile configuration. When we reached the 5 Tile configuration the cluster was unable to meet the minimum quality-of-service requirements of VMmark 2, however this still helps us to understand the performance characteristics of the cluster.

Monitoring how the average cluster CPU utilization changed during the course of our experiments is another critical component to understanding cluster behavior as load increases. The diagram below plots the VMmark 2 scores shown in the above graph and average cluster CPU utilization for each configuration.

The resulting diagram helps to illustrate what the impact on cluster CPU utilization and performance was by incrementing the work done by our cluster through the addition of VMmark 2 Tiles. The results show that the VMware’s vSphere matched-pair cluster was able to deliver outstanding scaling of enterprise-class applications while also providing unequaled flexibility in the load balancing, maintenance and provisioning of our cloud. This is just the beginning of what we’ll see in terms of analysis using the newly-released VMmark 2, we plan to explore larger and more diverse configurations next, so stay tuned …