Experimenting with Cluster Scale-Out Utilizing VMmark 2

The first article in our VMmark 2 series gave an in-depth introduction to the benchmark while also presenting results on the scaling performance of a cluster based on a matched pair of systems.  The goal of this article is to continue to characterize larger and more diverse cloud configurations by testing scale-out performance of an expanding vSphere cluster.  This blog explores an enterprise-class cluster’s performance as more servers are added and subsequently the amount of work being requested is increased. Determining the impact of adding hosts to a cluster is important because it enables the measurement of the total work being done as cluster capacity and workload demand increases within a controlled environment.  It also assists in identifying the efficiency with which a vSphere managed cluster can utilize an increasing number of hosts.

VMmark 2 Overview:

VMmark 2 is a next-generation, multi-host virtualization benchmark that models not only application performance but also the effects of common infrastructure operations. VMmark 2 is a combination of the application workloads and the infrastructure operations running simultaneously.  Although the application workload levels are scaled up by the addition of tiles, the infrastructure operations scale as the cluster size increases.  In general, the infrastructure operations increase with the number of hosts in an N/2 fashion, where N is the number of hosts.  To calculate the score for VMmark 2, final results are generated from a weighted average of the two kinds of workloads; hence scores will not increase linearly as tiles are added.  For more general information on VMmark 2, including the application and infrastructure workload details, take a look at the expanded overview in my previous blog post.

Environment Configuration:

  • Systems Under Test : 2-5 HP ProLiant DL380 G6
  • CPUs : 2 Quad-Core Intel® Xeon® CPU 5570 @ 2.93 GHz with HyperThreading Enabled
  • Memory : 96GB DDR2 Reg ECC
  • Hypervisor : VMware ESX 4.1
  • Virtualization Management : VMware vCenter Server 4.1

Testing Methodology:

To test scale out performance with VMmark 2, five identically-configured HP ProLiant DL380 servers were connected to an EMC Clarion CX3-80 storage array.  The minimum configuration for VMmark 2 is a two-host cluster running one tile.  The result from this minimal configuration was the baseline used, and all VMmark 2 scalability data in this article were normalized to that score.  A series of tests were then conducted on this two-host configuration, increasing the number of tiles being run until the cluster approached saturation.  As shown in the series’ first article, our two-host cluster approached saturation at four tiles but failed QoS requirements when running five tiles.  Starting with a common workload level of four tiles, the three-host, four-host, and five-host configurations were tested in a similar fashion, increasing the number of tiles until each configuration approached saturation.  Saturation was defined to be the point where the cluster was unable to meet the minimum quality-of-service requirements for VMmark 2.  For all testing, we recorded both the VMmark 2 score and the average cluster CPU utilization during the run phase.


Organizations often outgrow existing hardware capacity, and it can become necessary to add one or more hosts in order to relieve performance bottlenecks and meet increasing demands.  VMmark 2 was used to measure such a scenario by keeping the load constant as new hosts were incrementally added to the cluster.  The starting point for the experiments was four tiles.  At this load level the two hosts had approached saturation, with nearly 90% CPU utilization  The test then determined the impact on cluster CPU utilization and performance by adding identical hosts to the available cluster resources.


As expected, scoring gains were easily achieved by adding hosts until the environment was generating approximately the maximum scores for the four tile load level, as CPU resources become more plentiful.  In comparison to the two-host configuration, the normalized scores increased 6%, 12%, and 12% for the three-host, four-host, and five-host configurations, respectively.  The configurations with additional hosts were able to generate more throughput while also reducing the average cluster CPU utilization as the requested work was spread over more systems.  This highlights the additional CPU capacity held in reserve by the cluster at each data point.  By charting two or more points at the same load level, it is much easier to approximate the expected average CPU utilization after adding new hosts into the cluster.  This data, combined with established CPU usage thresholds, can make additional purchasing or system allocation decisions more straight-forward.

The above analysis looks at scale out performance for an expanding cluster with a fixed amount of work.  To get the whole picture of performance it’s necessary to measure performance and available capacity as the load and the number of hosts increases.  Specifically, as we progress through each of the configurations, does the reduction in cluster CPU utilization and improved performance measured in the previous experiment hold true for varied amounts of load and hosts?  

VMmark2-ScalingHostsScores VMmark2-ScalingHostsCPU 

As shown in the above graphs, the vSphere based cloud effortlessly integrated new hosts into our testing environment and delivered consistent returns on our physical server investments.  It’s important to note that in both the two-host and three-host configurations, the test failed at least one of the quality-of-service (QoS) requirements when the cluster reached saturation.  Also important, the five-host configuration was not run out to saturation due to a lack of additional client hardware.  During our testing the addition of each host showed expected results with respect to scaling of VMmark 2 scores.   As we went through each of the configurations, the normalized scores increased an average of 13%, 13%, and 16%, for the three-host, four-host, and five-host configurations, respectively.  Each of the configurations exhibited nearly linear scaling of CPU utilization as load was increased.  Based on these results, the VMware vSphere managed cluster was able to generate significant performance scaling while also utilizing the additional capacity of newly-provisioned hosts quite efficiently. 

Thus far all VMmark 2 studies have involved homogenous clusters of identical servers.  Stay tuned for experimentation utilizing varying storage and/or networking solutions as well as heterogeneous clusters…



Leave a Reply

Your email address will not be published. Required fields are marked *