Home > Blogs > VMware VROOM! Blog > Author Archives: Joshua Schnee

Author Archives: Joshua Schnee

Introducing VMmark ML

VMmark has been the go-to virtualization benchmark for over 12 years. It’s been used by partners, customers, and internally in a wide variety of technical applications. VMmark1, released in 2007, was the de-facto virtualization consolidation benchmark in a time when the overhead and feasibility of virtualization was still largely in question. In 2010, as server consolidation became less of an “if” and more of a “when,” VMmark2 introduced more of the rich vSphere feature set by incorporating infrastructure workloads (VMotion, Storage VMotion, and Clone & Deploy) alongside complex application workloads like DVD Store. Fast forward to 2017, and we released VMmark3, which builds on the previous versions by integrating an easy automation deployment service alongside complex multi-tier modern application workloads like Weathervane. To date, across all generations, we’ve had nearly 300 VMmark result publications (297 at the time of this writing) and countless internal performance studies.

Unsurprisingly, tech industry environments have continued to evolve, and so must the benchmarks we use to measure them. It’s in this vein that the VMware VMmark performance team has begun experimenting with other use cases that don’t quite fit the “traditional” VMmark benchmark. One example of a non-traditional use is Machine Learning and its execution within Kubernetes clusters. At the time of this writing, nearly 9% of the VMworld 2019 US sessions are about ML and Kubernetes. As such, we thought this might be a good time to provide an early teaser to VMmark ML and even point you at a couple of other performance-centric Machine Learning opportunities at VMworld 2019 US.

Although it’s very early in the VMmark ML development cycle, we understand that there’s a need for push-button-easy, vSphere-based Machine Learning performance analysis. As an added bonus, our prototype runs within Kubernetes, which we believe to be well-suited for this type of performance analysis.

Our internal-only VMmark ML prototype is currently streamlined to efficiently perform a limited number of operations very well as we work with partners, customers, and internal teams on how VMmark ML should be exercised. It is able to:

  1. Rapidly deploy Kubernetes within a vSphere environment.
  2. Deploy a variety of containerized ML workloads within our newly created VMmark ML Kubernetes cluster.
  3. Execute these ML workloads either in isolation or concurrently to determine the performance impact of architectural, hardware, and software design decisions.

VMmark ML development is still very fluid right now, but we decided to test some of these concepts/assumptions in a “real-world” situation. I’m fortunate to work alongside long-time DVD Store author and Big Data guru Dave Jaffe on VMmark ML.  As he and Sr. Technical Marketing Architect Justin Murray were preparing for their VMworld US talk, “High-Performance Virtualized Spark Clusters on Kubernetes for Deep Learning [BCA1563BU]“, we thought this would be a good opportunity to experiment with VMmark ML. Dave was able to use the VMmark ML prototype to deploy a 4-node Kubernetes cluster onto a single vSphere host with a 2nd-Generation Intel® Xeon® Scalable processor (“Cascade Lake”) CPU. VMmark ML then pulled a previously stored Docker container with several MLperf workloads contained within it. Finally, as a concurrent execution exercise, these workloads were run simultaneously, pushing the CPU utilization of the server above 80%. Additionally, Dave is speaking about vSphere Deep Learning performance in his talk “Optimize Virtualized Deep Learning Performance with New Intel Architectures [MLA1594BU],“ where he and Intel Principal Engineer Padma Apparao explore the benefits of Vector Neural Network Instructions (VNNI). I definitely recommend either of these talks if you want a deep dive into the details of VNNI or Spark analysis.

Another great opportunity to learn about VMware Performance team efforts within the Machine Learning space is to attend the Hands-on-Lab Expert Lead Workshop, “Launch Your Machine Learning Workloads in Minutes on VMware vSphere [ELW-2048-01-EMT_U],” or take the accompanying lab. This is being led by another VMmark ML team member Uday Kurkure along with Staff Global Solutions Consultant Kenyon Hensler. (Sign up for the Expert Lead using the VMworld 2019 mobile application or on my.vmworld.com.)

Our goal after VMworld 2019 US is to continue discussions with partners, customers, and internal teams about how a benchmark like VMmark ML would be most useful. We also hope to complete our integration of Spark within Kubernetes on vSphere and reproduce some of the performance analysis done to date. Stay tuned to the performance blog for additional posts and details as they become available.

VMmark 3.1 Released

It is my great pleasure to announce that VMmark 3.1 is generally available as of February 7, 2019!

What’s New?

This release adds support for persistent memory, improves workload scalability, and better reflects secure customer environments by increasing side-channel vulnerability mitigation requirements.

Visit our main VMmark HTML page for more information.

Please note that VMmark 3.0 will end of life on March 15th, 2019.

To learn more about VMmark3 see the introductory blog article here

Addressing Meltdown/Spectre in VMmark

The recently described Meltdown/Spectre vulnerabilities have implications throughout the tech industry, and the VMmark virtualization benchmark is no exception. In deciding how to approach the issue, the VMmark team’s goal was to address the impact of the these vulnerabilities while maintaining the value and integrity of the benchmark.

Applying the full set of currently available Meltdown/Spectre mitigations is likely to have a significant impact on VMmark scores. Because the mitigations are expected to continue evolving for some time, that impact might even change. If the VMmark team were to require the full set of mitigations in order for a submission to be compliant, that might make new submissions non-competitive with older ones, and also introduce more “noise” into VMmark scores as the mitigations evolve. While our intention for the future is that eventually all new VMmark results will be obtained on virtualization platforms that have the full set of Meltdown/Spectre mitigations, we have chosen to take a gradual approach.

Beginning May 8, 2018, all newly-published VMmark results must comply with a number of new requirements related to the Meltdown and Spectre vulnerabilities. These requirements are detailed in Appendix C of the latest edition of the VMmark User’s Guide.

Before performing any VMmark benchmark runs intended for publication, check the VMmark download page to make sure you’re using the latest edition of the VMmark User’s Guide.  If you have questions, you can reach the VMmark team at vmmark-info@vmware.com.

Introducing VMmark3: A highly flexible and easily deployed benchmark for vSphere environments

VMmark 3.0, VMware’s multi-host virtualization benchmark is generally available here.  VMmark3 is a free cluster-level benchmark that measures the performance, scalability, and power of virtualization platforms.

VMmark3 leverages much of previous VMmark generations’ technologies and design.  It continues to utilize a unique tile-based heterogeneous workload application design. It also deploys the platform-level workloads found in VMmark2 such as vMotion, Storage vMotion, and Clone & Deploy.  In addition to incorporating new and updated application workloads and infrastructure operations, VMmark3 also introduces a new fully automated provisioning service that greatly reduces deployment complexity and time.

Continue reading

Measuring Cloud Scalability Using the Weathervane Benchmark

Cloud-based deployments continue to be a hot topic in many of today’s corporations.  Often the discussion revolves around workload portability, ease of migration, and service pricing differences.  In an effort to bring performance into the discussion we decided to leverage VMware’s new benchmark, Weathervane.  As a follow-on to Harold Rosenberg’s introductory Weathervane post we decided to showcase some of the flexibility and scalability of our new large-scale benchmark.  Previously, Harold presented some initial scalability data running on three local vSphere 6 hosts.  For this article, we decided to extend this further by demonstrating Weathervane’s ability to run within a non-VMware cloud environment and scaling up the number of app servers.

Weathervane is a new web-application benchmark architected to simulate modern-day web applications.  It consists of a benchmark application and a workload driver.  Combined, they simulate the behavior of everyday users attending a real-time auction.  For more details on Weathervane I encourage you to review the introductory post.

Environment Configuration:
Cloud Environment: Amazon AWS, US West.
Instance Types: M3.XLarge, M3.Large, C3.Large.
Instance Notes: Database instances utilized an additional 300GB io1 tier data disk.
Instance Operating System: Centos 6.5 x64.
Application: Weathervane Internal Build 084.

Testing Methodology:
All instances were run within the same cloud environment to reduce network-induced latencies.  We started with a base configuration consisting of eight instances.  We then  scaled out the number of workload drivers and application servers in an effort to identify how a cloud environment scaled as application workload needs increased.  We used Weathervane’s FindMax functionality which runs a series of tests to determine the maximum number of users the configuration can sustain while still meeting QoS requirements.  It should be noted that the early experimentation allowed us to identify the maximum needs for the other services beyond the workload drivers and application servers to reduce the likelihood of bottlenecks in these services.  Below is a block diagram of the configurations used for the scaled-out Weathervane deployment.


For our analysis of Weathervane cloud scaling we ran multiple iterations for each scale load level and selected the average.  We automated the process to ensure consistency.  Our results show both the number of users sustained as well as the http requests per second as reported by the benchmark harness.


As you can see in the above graph, for our cloud environment running Weathervane, scaling the number of applications servers yielded nearly linear scaling up to five application servers. The delta in scaling between the number of users and the http requests per second sustained was less than 1%.  Due to time constraints we were unable to test beyond five application servers but we expect that the scaling would have continued upwards well beyond the load levels presented.

Although just a small sample of what Weathervane and cloud environments can scale to, this brief article highlights both the benchmark and cloud environment scaling.  Though Weathervane hasn’t been released publicly yet, it’s easy to see how this type of controlled, scalable benchmark will assist in performance evaluations of a diverse set of environments.  Look for more Weathervane based cloud performance analysis in the future.


Comparing Storage Density, Power, and Performance with VMmark 2.5

Datacenters continue to grow as the use of both public and private clouds becomes more prevalent.  A comprehensive review of density, power, and performance is becoming more crucial to understanding the tradeoffs when considering new storage technologies as a replacement for legacy solutions.  Expanding on previous articles around comparing storage technologies and the IOPS performance available when using flash-based storage, in this article we are comparing the density, power, and performance differences between traditional hard disk drive (HDDs) and flash-based storage.  As might be expected, we found that the flash-based storage performed very well in comparison to the traditional hard disk drives.  This article quantifies our findings.

In addition to VMmark’s previous performance measurement capability, VMmark 2.5 adds the ability to collect power measurements on servers and storage under test.  VMmark 2.5 is a multi-host virtualization consolidation benchmark that utilizes a combination of application workloads and infrastructure operations running simultaneously to model the performance of a cluster.  For more information on VMmark 2.5, see this overview.

Environment Configuration:
Hypervisor: VMware vSphere 5.1
Servers: Two x Dell PowerEdge R720
BIOS settings: High Performance Profile Enabled
CPU: Two x 2.9GHz Intel Xeon CPU-E5-2690
Memory: 192GB
HBAs: Two x 16Gb QLE2672 per system under test
– HDD-Configuration: EMC CX3-80, 120 disks, 8 Trays, 1 SPE, 30U
– Flash-Based-Configuration: Violin Memory 6616, 64 VIMMs, 3U
Workload: VMware VMmark 2.5.1

Testing Methodology:
For this experimentation we set up a vSphere 5.1 DRS-enabled cluster consisting of two identically configured Dell PowerEdge R720 servers.  A series of VMmark 2.5 tests were then conducted on the cluster with the same VMs being moved to the storage configuration under test, progressively increasing the number of tiles until the cluster reached saturation.  Saturation was defined as the point where the cluster was unable to meet the VMmark 2.5 quality-of-service (QoS) requirements. We selected the EMC CX3-80 and the Violin Memory 6616 as representatives of the previous generation of traditional HDD-based and flash based storage, respectively. We would expect comparable arrays in these generations to have characteristics similar to what we measured in these tests.  In addition to the VMmark 2.5 results, esxtop data was collected to provide further statistics.  The HDD configuration running a single tile was used as the baseline and all VMmark 2.5 results in this article (excluding raw Watts metrics, %CPU, and Latency) were normalized to that result.

Average Watts and VMmark 2.5 Performance Per Kilowatt Comparison:
For our comparison of the two technologies, the first point of evaluation was reviewing both the average watts required by the storage arrays and the corresponding VMmark 2.5 Performance Per Kilowatt (PPKW) score.  Note that the HDD configuration reached saturation at 7 tiles. In contrast, the Flash-based configuration was able to support a total of 9 tiles, while still meeting the quality of service requirements for VMmark 2.5.

As can be seen from the above graphs, the difference between the two technologies is extremely obvious.  The average watts drawn by the Flash-based configuration was nearly 50% less than the HDD configuration across all tiles tested.  Additionally, the PPKW score of the Flash-based configuration was on average 3.4 times higher than the HDD configuration, across all runs.

Application Score Comparison:
Due to the very large difference in PPKW, we decided to dig deeper into the potential root causes, beyond just the discrepancy in power consumed.  Because the application workloads exhibit random access patterns, as opposed to the sequential nature of infrastructure operations, we focused on the differences in application scores between the two configurations, as this is where we would expect to see the majority of the gains provided by the Flash-based configuration.

The difference between the scaling of the application workloads is quite obvious.  Although running the same number of tiles, and thus attempting the same amount of work, the flash-based configuration was able to produce application workload scores that were 1.9 times higher than the HDD configuration across 7 tiles.

CPU and Latency Comparison:
After exploring the power consumption and various areas of performance difference, we decided to look into two additional key components behind the performance improvements: CPU utilization and storage latency.

In our final round of data assessment we found that the CPU utilization of the flash-based storage was on average 1.53 times higher than the HDD configuration, across all 7 tiles.  Higher CPU utilization might appear to be sub-optimal, however we determined that the systems were waiting less time for I/O to complete and were thus getting more work done.  This is especially visible when reviewing the storage latencies of the two configurations.  The flash-based configuration showed extremely flat latencies, and had on average less than one tenth of the HDD configuration’s latencies.

Finally, when comparing the physical space requirements of the two configurations, the flash-based storage was effectively 92% denser than the traditional HDD configurations (achieving 9 tiles in 3U versus 7 tiles 30U). In addition to physical density advancements, the flash-based storage allowed for a 29% increase in the number of VMs run on the same server hardware, while maintaining QoS requirements of VMmark 2.5.

The flash-based storage showed wins across the board for power and performance.  The flash-based storage consumed half the power while achieving over three times the performance.  Although the initial costs of flash-based storage can be somewhat daunting when compared to traditional HDD storage, the reduction in power, increased density, and superior performance of the flash-based storage certainly seems to provide a strong argument for integrating the technology into future datacenters. VMmark 2.5 gives us the ability to look at the larger picture, making an informed decision across a wide variety of today’s concerns.

vSphere 5.1 IOPS Performance Characterization on Flash-based Storage

At VMworld 2012 we demonstrated a single eight-way VM running on vSphere 5.1 exceeding one million IOPS.  This testing illustrated the high end IOPS performance of vSphere 5.1.

In a new series of tests we have completed some additional characterization of high I/O performance using a very similar environment. The only difference between the 1 million IOPS test environment and the one used for these tests is that the number of Violin Memory Arrays was reduced from two to one (one of the arrays was a short term loan).

Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: Two Intel Xeon E5-2690, HyperThreading disabled
Memory: 256GB
HBAs: Five QLogic QLE2562
Storage: One Violin Memory 6616 Flash Memory Array
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Configuration: Random, 4KB I/O size with 16 workers

We continued to characterize the performance of vSphere 5.1 and the Violin array across a wider range of configurations and workload conditions.

Based on the types of questions that we often get from customers, we focused on RDM versus VMFS5 comparisons and the usage of various I/O sizes.  In the first series of experiments we compared RDM versus VMFS5 backed datastores using 100% read workload mix while ramping up the I/O size.

click to enlarge

As you can see from the above graph, VMFS5 yielded roughly equivalent performance to that of RDM backed datastores.  Comparing the average of the deltas across all data points showed performance within 1% of RDM for both IOPS and MB/s.  As expected, the number of IOPS decreased after we exceed the default array block size of 4KB, but the throughput continued to scale, approaching 4500 MB/s at both 8KB and 16KB sizes.

For our second series of experiments, we continued to compare RDM versus VMFS5 backed datastores through a progression of block sizes, but this time we altered the workload mix to include 60% reads and 40% writes.

click to enlarge

Violin Memory arrays use a 4KB sector size and perform at their optimal level when managing 4KB blocks. This is very visible in the above IOPS results at the 4KB block size. In the above graph, comparing RDM and VMFS5 IOPS, you can see that VMFS5 performs very well with a 60% read, 40% write mix.  Throughputs continued to scale in a similar fashion as the read-only experimentation and VMFS5 performance for both IOPS and MB/s were within .01% of RDM performance when comparing the average of the deltas across all data points.

The amount of I/O, with just one eight-way VM running on one Violin storage array, is both considerable and sustainable at many I/O sizes.  It’s also noteworthy to point out that running a 60% read and 40% write I/O mix still generated substantial IOPs and bandwidth. While in most cases a single VM won’t need to drive nearly this much I/O traffic, these experiments show that vSphere 5.1 is more than capable of handling it.

1millionIOPS On 1VM

Last year at VMworld 2011 we presented one million I/O operations per second (IOPS) on a single vSphere 5 host (link).  The intent was to demonstrate vSphere 5’s performance by using mutilple VMs to drive an aggregate load of one million IOPS through a single server.   There has recently been some interest in driving similar I/O load through a single VM.  We used a pair of Violin Memory 6616 flash memory arrays, which we connected to a two-socket HP DL380 server, for some quick experiments prior to VMworld.  vSphere 5.1 was able to demonstrate high performance and I/O efficiency by exceeding one million IOPS, doing so with only a modest eight-way VM.  A brief description of our configuration and results is given below.

Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: 2 x Intel Xeon E5-2690, HyperThreading disabled
Memory: 256GB
HBAs: 5 x QLE2562
Storage: 2 x Violin Memory 6616 Flash Memory Arrays
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Config: 4K IO size w/ 16 workers

Using the above configuration we achieved 1055896 total sustained IOPS.  Check out the following short video clip from one of our latest runs.

Look out for a more thorough write-up after VMworld.


Analysis of Storage Technologies on Clusters using VMmark 2.1

Previous blog entries utilizing VMmark 2.1 introduced the benchmark, showed the effects of generational scaling, and evaluated the scale-out performance of vSphere clusters.  This article analyzes the performance impact of the type of storage infrastructure used, specifically when comparing the effects of Enterprise Flash Drives (EFDs; often referred to as SSDs) versus traditional SCSI HDDs.  There is a general perception, both in the consumer and business space, that EFDs are better than HDDs.  Less clear, however, is how much better and whether the performance benefits of the typically more expensive EFDs are observed in today’s more complex datacenters. 

VMmark 2 Overview:

Once again we used VMmark2.1 to model the performance characteristics of a multi-host heterogeneous virtualization environment.  VMmark 2.1 is a combination of application workloads and infrastructure operations running simultaneously.  In general, the infrastructure operations increase with the number of hosts in an N/2 fashion, where N is the number of hosts.  To calculate the score for VMmark 2.1, final results are generated from a weighted average of the two kinds of workloads; hence scores will not increase linearly as workload tiles are added.  For more general information on VMmark 2.1, including the application and infrastructure workload details, take a look at the expanded overview in my previous blog post or the VMmark 2.1 release notification written by Bruce Herndon.

Environment Configuration:

  • Systems Under Test: 2 HP ProLiant DL380 G6
  • CPUs: 2 Quad-Core Intel® Xeon® CPU 5570 @ 2.93 GHz with Hyper-Threading enabled per system
  • Memory: 96GB DDR2 Reg ECC per system
  • Storage Arrays Under Test:
    • HDD: EMC CX3-80
      • 8 Enclosures: RAID0 LUNs, 133.68GB FC HDDs
    • EFD: EMC CX4-960
      • 4 Enclosures: RAID0 LUNs, mix of 66.64GB and 366.8GB FC EFDs
  • Hypervisor: VMware ESX 4.1
  • Virtualization Management: VMware vCenter Server 4.1

Testing Methodology:

To analyze the comparative performance of EFDs versus HDDs with VMmark 2.1, a vSphere DRS enabled cluster consisting of two identically-configured HP ProLiant DL380 servers was connected to the two EMC storage arrays.  A series of tests were then conducted against the cluster with the same VMs being moved to the storage array under test, increasing the number of tiles until the cluster approached saturation.  Saturation was defined as the point where the cluster was unable to meet the minimum quality-of-service (QoS) requirements for VMmark 2.1.  The minimum configuration for VMmark 2.1 is a two-host cluster running a single tile.  The result from this minimal configuration on the HDD storage array was used as the baseline, and all VMmark 2.1 data in this article were normalized to that result.  In addition to the standard VMmark 2.1 results, esxtop data was also collected during the measurement phase of the benchmark to provide additional statistics. 


In a top-down approach to reviewing the two storage technologies, it seems natural that the first point of comparison would be the overall performance of VMmark 2.1.  By comparing the normalized scores, it’s possible to immediately see the impact of running our cluster on EFDs versus traditional HDDs at a variety of load levels.


Click to Enlarge

The improvement in score is apparent at every point of utilization, from the lowest-loaded 1-tile configuration out to the saturation point of 6 tiles.  Overall, the average improvement in score for the EFD configuration was 25.4%.  And while the HDD configuration was unable to meet the QoS requirements at 6 tiles, the EFD configuration not only met the requirements, but also improved the overall VMmark 2.1 score, even when the cluster was completely saturated (as seen in the graph below).  VMmark 2.1 can drive a considerable amount of I/O, up to many thousands of IOPS for large numbers of tiles.  Digging deeper into the root cause of such dramatic improvement for EFDs led me to investigate the overall throughputs for each of the configurations. 


Click to Enlarge

It’s apparent from the above graph that there was significant improvement in the total bandwidth, represented by Total MB/s, in the EFD configurations.  Compared to the HDD configuration, the EFD configuration’s total throughput improved (8%, 9.2%, 9.5%, 6.5%, and 14.5%, respectively). The amount of improvement actually increased as the I/O demands on the cluster increased.  Another interesting detail that arose from reviewing the data over numerous points of utilization was that %CPU used on the EFD configuration was typically higher than its HDD counterpart at the same load.  Although slightly counter-intuitive at first, it makes sense that if the system is waiting less for I/Os to complete, it can spend more time doing actual work as demonstrated by the higher VMmark 2.1 scores.  This observation leads to another interesting comparison.  Disk latency characteristics are often used to predict hardware performance. This can be useful, but what can be unclear is how this translates to real-world disk latencies running a diverse set of workloads. 


Lower is Better:Click to Enlarge

Above is a series of graphs that display the average latency reported per write and read I/Os (note that lower latency is better).  In looking at each of the key latency counters we can get a better sense for where the additional performance is derived.  There’s a generalization that EFDs have poor write speeds by comparison to today’s HDDs.  The results here show that the generalization doesn’t always apply.  In fact, when looking at the average write latency for the tested EFDs across all data points, it was within 1% of the average write latency for the tested HDDs.  Additionally, reviewing the read latency comparison data showed massive reductions in latency across all workload levels, 76% on average.  Depending on the workload being run, this in itself could be all the justification needed to move to the newer technology.

It isn’t surprising that EFDs outperformed HDDs.  What is somewhat unexpected is the amount of performance, and the ability for EFDs to show immediate advantages even on the most lightly loaded clusters. With an average VMmark 2.1 score improvement of 25.4%, an average bandwidth increase of 9.6%, and a combined average read latency reduction of 76%, it’s easy to imagine there are a great many environments that might benefit from the real-world performance of EFDs. 




Exploring Generational Scaling with VMmark 2.1

The steady march of technological improvements is nothing new.  As companies either expand or refresh their datacenters it often becomes a non-trivial task to quantify the returns on hardware investments.  This difficulty can be further compounded when it’s no longer sufficient to answer how well one new server will perform in relation to its predecessor, but rather how well the new cluster will perform in comparison to the previous one.  With this in mind, we set off to explore the generational scaling performance of two clusters made up of very similar hardware using the newly released VMmark 2.1.  VMmark 2.1 is a next-generation, multi-host virtualization benchmark that models not only application performance but also the effects of common infrastructure operations.  For more general information on VMmark 2.1, including the application and infrastructure workload details, take a look at the expanded overview in one of my previous blog posts.

Environment Configuration:

  • Clusters Under Test
    • Cluster 1
      • Systems Under Test: 2 x Dell PowerEdge R805
      • CPUs: 2 Six-Core AMD Opteron™ 2427 @ 2.2 GHz
      • Memory: 128GB DDR2 Reg ECC @ 533MHz
      • Storage Array: EMC CX4-120
      • Hypervisor: VMware ESX 4.1
      • Virtualization Management: VMware vCenter Server 4.1
    • Cluster 2
      • Systems Under Test: 2 x Dell PowerEdge R815
      • CPUs: 2 Twelve-Core AMD Opteron™ 6174 @ 2.2 GHz
      • Memory: 128GB DDR3 Reg ECC @ 1066MHz
      • Storage Array: EMC CX4-120
      • Hypervisor: VMware ESX 4.1
      • Virtualization Management: VMware vCenter Server 4.1
  • VMmark 2.1

Testing Methodology:

To measure the generational improvement of the two clusters under test every attempt was made to set up and configure the servers identically.  The minimum configuration for VMmark 2.1 is a two-host cluster running a single tile.  The result from this minimal configuration on the older cluster, or cluster #1, was used as the baseline and all VMmark 2.1 scalability data in this article were normalized to that score.  A series of tests were then conducted on each of the clusters in isolation, increasing the number of tiles being run until the cluster approached saturation.  Saturation was defined as the point where the cluster was unable to meet the minimum quality-of-service (QoS) requirements for VMmark 2.1.  Results that were unable to meet minimum QoS for VMmark 2.1 were not plotted.


The primary component of change between the two clusters, making up the predominant factor in the generational scaling, is the change in processors.  The AMD Opteron™ 2427 processors provide six cores per socket for a total of twelve logical processors per server, whereas the newer AMD Opteron™ 6174 processors have twelve cores per socket for a total of twenty-four logical processors per server.  Factor in a doubling of the L3 cache per socket, as well as a doubling of the systems’ memory speeds, and the change in server characteristics is quite significant.


As shown in the above graph, the generational scaling between the two clusters under test is significant.  In the one-tile case, both clusters were able to perform the work requested without the presence of resource constraints.  The performance improvement of the newer cluster became more apparent once we started scaling up the number of tiles and significantly increasing the level of CPU over commitment and utilization.  It’s important to note that while adding tiles does effectively linearly increase the application workload requests being made, the workload caused by infrastructure operations does not scale in the same way, and was a constant across all tests.  Cluster #1 scaled to three tiles, at which point it was saturated and unable to support additional tiles while continuing to the meet the minimum quality-of-service (QoS) requirements of the benchmark.  For comparison, Cluster #2 was able to achieve an increase of normalized VMmark 2.1 scores of 1%, 14% and 9% for the one-tile, two-tile, and three-tile configurations, respectively.  Cluster #2, was then scaled to seven tiles, beyond which point it was unable to meet the minimum QoS requirements.

The newer generation cluster, with two Dell PowerEdge R815 AMD Opteron™ 6174 based hosts running vSphere 4.1, exhibited excellent scaling as the load was increased up to seven tiles, more than doubling the previous generation cluster’s performance and work accomplished.  Because VMmark 2.1 not only utilizes heterogeneous applications across a diverse computing environment, but also measures the impact of the common place infrastructure operations, it provided valuable insight on the generational scaling of the two cluster generations.  VMmark 2.1 proved itself an able benchmark for acquiring the answers for previously difficult datacenter questions.