Exploring Generational Scaling with VMmark 2.1

The steady march of technological improvements is nothing new.  As companies either expand or refresh their datacenters it often becomes a non-trivial task to quantify the returns on hardware investments.  This difficulty can be further compounded when it’s no longer sufficient to answer how well one new server will perform in relation to its predecessor, but rather how well the new cluster will perform in comparison to the previous one.  With this in mind, we set off to explore the generational scaling performance of two clusters made up of very similar hardware using the newly released VMmark 2.1.  VMmark 2.1 is a next-generation, multi-host virtualization benchmark that models not only application performance but also the effects of common infrastructure operations.  For more general information on VMmark 2.1, including the application and infrastructure workload details, take a look at the expanded overview in one of my previous blog posts.

Environment Configuration:

  • Clusters Under Test
    • Cluster 1
      • Systems Under Test: 2 x Dell PowerEdge R805
      • CPUs: 2 Six-Core AMD Opteron™ 2427 @ 2.2 GHz
      • Memory: 128GB DDR2 Reg ECC @ 533MHz
      • Storage Array: EMC CX4-120
      • Hypervisor: VMware ESX 4.1
      • Virtualization Management: VMware vCenter Server 4.1
    • Cluster 2
      • Systems Under Test: 2 x Dell PowerEdge R815
      • CPUs: 2 Twelve-Core AMD Opteron™ 6174 @ 2.2 GHz
      • Memory: 128GB DDR3 Reg ECC @ 1066MHz
      • Storage Array: EMC CX4-120
      • Hypervisor: VMware ESX 4.1
      • Virtualization Management: VMware vCenter Server 4.1
  • VMmark 2.1

Testing Methodology:

To measure the generational improvement of the two clusters under test every attempt was made to set up and configure the servers identically.  The minimum configuration for VMmark 2.1 is a two-host cluster running a single tile.  The result from this minimal configuration on the older cluster, or cluster #1, was used as the baseline and all VMmark 2.1 scalability data in this article were normalized to that score.  A series of tests were then conducted on each of the clusters in isolation, increasing the number of tiles being run until the cluster approached saturation.  Saturation was defined as the point where the cluster was unable to meet the minimum quality-of-service (QoS) requirements for VMmark 2.1.  Results that were unable to meet minimum QoS for VMmark 2.1 were not plotted.


The primary component of change between the two clusters, making up the predominant factor in the generational scaling, is the change in processors.  The AMD Opteron™ 2427 processors provide six cores per socket for a total of twelve logical processors per server, whereas the newer AMD Opteron™ 6174 processors have twelve cores per socket for a total of twenty-four logical processors per server.  Factor in a doubling of the L3 cache per socket, as well as a doubling of the systems’ memory speeds, and the change in server characteristics is quite significant.


As shown in the above graph, the generational scaling between the two clusters under test is significant.  In the one-tile case, both clusters were able to perform the work requested without the presence of resource constraints.  The performance improvement of the newer cluster became more apparent once we started scaling up the number of tiles and significantly increasing the level of CPU over commitment and utilization.  It’s important to note that while adding tiles does effectively linearly increase the application workload requests being made, the workload caused by infrastructure operations does not scale in the same way, and was a constant across all tests.  Cluster #1 scaled to three tiles, at which point it was saturated and unable to support additional tiles while continuing to the meet the minimum quality-of-service (QoS) requirements of the benchmark.  For comparison, Cluster #2 was able to achieve an increase of normalized VMmark 2.1 scores of 1%, 14% and 9% for the one-tile, two-tile, and three-tile configurations, respectively.  Cluster #2, was then scaled to seven tiles, beyond which point it was unable to meet the minimum QoS requirements.

The newer generation cluster, with two Dell PowerEdge R815 AMD Opteron™ 6174 based hosts running vSphere 4.1, exhibited excellent scaling as the load was increased up to seven tiles, more than doubling the previous generation cluster’s performance and work accomplished.  Because VMmark 2.1 not only utilizes heterogeneous applications across a diverse computing environment, but also measures the impact of the common place infrastructure operations, it provided valuable insight on the generational scaling of the two cluster generations.  VMmark 2.1 proved itself an able benchmark for acquiring the answers for previously difficult datacenter questions.  






One comment has been added so far

Leave a Reply

Your email address will not be published. Required fields are marked *