Home > Blogs > VMware VROOM! Blog > Author Archives: Joshua Schnee

Author Archives: Joshua Schnee

Two Host Matched-Pair Scaling Utilizing VMmark 2

As mentioned in Bruce’s previous blog, VMmark 2.0 has been released.  With its release we can now begin to benchmark an enterprise-class cloud platform in entirely new and interesting ways.  VMmark 2 is based on a multi-host configuration that includes bursty application and infrastructure workloads to drive load against a cluster.  VMmark 2 allows for the analysis of infrastructure operations within a controlled benchmark environment for the first time, distinguishing it from server consolidation benchmarks. 

Leading off a series of new articles introducing VMmark 2, the goal of this article was to provide a bit more detail about VMmark 2 and to test a vSphere enterprise cloud, focusing on the scaling performance of a matched pair of systems.  More simply put, this blog looks to see what happens to cluster performance as more load is added to a pair of identical servers.  This is important because it allows a means for identifying the efficiency of a vSphere cluster as demand increases.

VMmark2 Overview

VMmark 2 is a next-generation, multi-host virtualization benchmark that not only models application performance but also the effects of common infrastructure operations. It models application workloads in the now familiar VMmark 1 tile-based approach, where the benchmarker adds tiles until either a goal is met or the cluster is at saturation.  It’s important to note that while adding tiles does effectively linearly increase the application workload requests being made, the load caused by infrastructure operations does not scale in the same way.  VMmark 2 infrastructure operations scale as the cluster size grows to better reflect modern datacenters.  Greater detail on workload scaling can be found within the benchmarking guide available for download.  To calculate the score for VMmark 2, final results are generated from a weighted average of the two kinds of workloads; hence scores will not linearly increase as tiles are added.  In addition to the throughput metrics, quality-of-service (QoS) metrics are also measured and minimum standards must be maintained for a result to be considered fully compliant.

VMmark 2 contains the combination of the application workloads and infrastructure operations running simultaneously.  This allows for the benchmark to include both of these critical aspects in the results that it reports.  The application workloads that make up a VMmark 2 tile were chosen to better reflect applications in today’s datacenters by employing more modern and diverse technologies.  In addition to the application workloads, VMmark 2 makes infrastructure operation requests of the cluster.  These operations stress the cluster with the use of vMotion, storage vMotion and Deploy operations.  It’s important to note that while the VMmark 2 harness is stressing the cluster through the infrastructure operations, VMware’s Distributed Resource Scheduler (DRS) is dynamically managing the cluster in order to distribute and balance the computing resources available.  The diagrams below summarize the key aspects of the application and infrastructure workloads.

VMmark 2 Workloads Details:

VMmark2.0AppWkTile

Application Workloads – Each “Tile” consists of the following workloads and VMs.

DVD Store 2.1  - multi-tier OLTP workload consisting of a database VM and three webserver VMs driving a bursty load profile

• Exchange 2007

• Standby Server (heart beat server)

OLIO - multi-tier social networking workload consisting of a web server and a database server.

VMmark2.0InfWk

Infrastructure Workloads – Consists of the following

• User-initiated vMotion.

Storage vMotion.

• Deploy : VM cloning, OS customization, and Updating.

DRS-initiated vMotion to accommodate host-level load variations

 

Environment Configuration:

  • Systems Under Test : 2 HP ProLiant DL380 G6
  • CPUs : 2 Quad-Core Intel® Xeon® CPU 5570 @ 2.93 GHz with HyperThreading Enabled
  • Memory : 96GB DDR2 Reg ECC
  • Storage Array : EMC CX380
  • Hypervisor : VMware ESX 4.1
  • Virtualization Management : VMware vCenter Server 4.1.0

Testing Methodology:

To test scalability as the number of VMmark 2 tiles increases, two HP ProLiant DL380 servers were configured identically and connected to an EMC Clarion CX-380 storage array.  The minimum configuration for VMmark 2 is a two-host cluster running 1 tile, as such this was our baseline and all VMmark 2 scores were normalized to this result.  A series of tests were then conducted on this two-host configuration increasing the number of tiles being run until the cluster approached saturation, recording both the VMmark 2 score and the average cluster CPU utilization during the run phase.

Results:

In circumstances where demand on a cluster increases, it becomes critical to understand how the environment adapts to these demands in order to plan for future needs.  In many cases it can be especially important for businesses to understand how the application and infrastructure workloads were individually impacted.  By breaking out the distinct VMmark 2 sub-metrics we can get a fine grained view of how the vSphere cluster responded as the number of tiles, and thus work performed, increased.

   VMmark2.0DetailedScaling

From the graph above we see the VMmark 2 scores show significant gains until reaching the point where the two-host cluster was saturated at 5 Tiles.  Delving into this further, we see that as expected, the infrastructure operations remained nearly constant due to the requested infrastructure load not changing during the experimentation.  Continued examination shows that the cluster was able to achieve nearly linear scaling for the application workloads through 4 Tiles.  This is equivalent to 4 times the application work requested of the 1 Tile configuration.  When we reached the 5 Tile configuration the cluster was unable to meet the minimum quality-of-service requirements of VMmark 2, however this still helps us to understand the performance characteristics of the cluster.

Monitoring how the average cluster CPU utilization changed during the course of our experiments is another critical component to understanding cluster behavior as load increases.  The diagram below plots the VMmark 2 scores shown in the above graph and average cluster CPU utilization for each configuration.

VMmark2.0ClusterScaling

The resulting diagram helps to illustrate what the impact on cluster CPU utilization and performance was by incrementing the work done by our cluster through the addition of VMmark 2 Tiles. The results show that the VMware’s vSphere matched-pair cluster was able to deliver outstanding scaling of enterprise-class applications while also providing unequaled flexibility in the load balancing, maintenance and provisioning of our cloud. This is just the beginning of what we’ll see in terms of analysis using the newly-released VMmark 2, we plan to explore larger and more diverse configurations next, so stay tuned …

 

Comparing Fault Tolerance Performance & Overhead Utilizing VMmark v1.1.1

VMware Fault
Tolerance (FT), based on vLockstep technology and available with VMware
vSphere, easily and efficiently provides zero downtime and zero data loss for
your critical workloads. FT provides continuous availability in the event of
server failures by creating a live shadow instance of the primary virtual
machine on a secondary system.  The
shadow VM (or secondary VM), running on the secondary system, executes sequences
of x86 instructions identical to the primary VM, with which it proceeds in
vLockstep.  By doing so, if catastrophic
failure of the primary system occurs it causes an instantaneous failover to the
secondary VM that would be virtually indistinguishable to the end user. While
FT technology is certainly compelling, some potential users express concern
about possible performance overhead. In this article, we explore the
performance implications of running FT in realistic scenarios by measuring an
FT-enabled environment based on the heterogeneous workloads found in VMmark, the tile-based
mixed-workload consolidation benchmark from VMware®.

Figure 1 : High Level Architecture of
VMwar
e Fault Tolerance

Pic1

Environment Configuration :

System under Test

2 x Dell PowerEdge R905

CPUs

4 Quad-Core AMD Opteron 8382
(2.6GHz)

4 Quad-Core AMD Opteron 8384
(2.7GHz)

Memory

128GB DDR2 Reg ECC

Storage Array

EMC CX380

Hypervisor

VMware ESX 4.0

Application

VMmark v1.1.1

Virtual Hardware (per tile)

8 vCPUs, 5GB memory, 62GB disk

  •  VMware Fault Tolerance currently
    only supports 1 vCPU VMs and requires specific processors for enablement; for
    the purposes of our experimentation our VMmark Database and MailServer VMs were
    set to run with 1vCPU only.  For more
    information on FT and its requirements see
    here.
  • VMmark
    is a benchmark intended to measure the performance of virtualization environments
    in an effort to allow customers to compare platforms.  It is also useful in studying the effect of
    architectural features. VMmark consists of six workloads (Web, File, Database,
    Java, Mail and Standby servers). Multiple sets of workloads (tiles) can be added
    to scale the benchmark load to match the underlying hardware resources. For
    more information on VMmark see
    here.


Test Methodology :

An
initial performance baseline was established by running VMmark from 1 to 13
tiles on the primary system with Fault Tolerance disabled for all workloads. FT
was then enabled for the MailServer and Database workloads after customer
feedback suggested they were the applications most likely to be protected by FT.
The performance tests were then executed a second time and compared to the
baseline performance data.

 

Results
:

The
results in Table 1 are enlightening as to the performance and efficiency of
VMware’s Fault Tolerance.  For this case,
“FT-enabled Secondary %CPU”, indicates the total CPU utilized by the secondary
system under test.  It should also be
noted that, for our workload, the default ESX 4.0, High Availability, and Fault
Tolerance settings were used and these results should be considered ‘out of the
box’ performance for this configuration. 
Finally, the secondary system’s %CPU is much lower by comparison to the
primary system because it is only running the MailServer and Database
workloads, as opposed to the six workloads that are being run on the primary
system.

Table 1:

Pic2b  

You can see that as we scaled
both configurations toward saturation the overhead of enabling VMware Fault
Tolerance remains surprisingly consistent, with an average delta in %CPU used
of 7.89% over all of the runs.  ESX was
also able to achieve very comparable scaling for both FT-enabled and FT-disabled
configurations.  It isn’t until the FT-enabled
configuration nears complete saturation, a scenario most end users will never
see, that we start to see any real discernable delta in scores.

It should be noted that these
performance and overhead statements may or may not be true for dissimilar
workloads and systems under test.  From
the results of our testing you can see that the advantage of having Mail
servers and Database servers truly protected, without fear of end-user
interruption, is completely justified.

It’s a tough world out there; you
never know when the next earthquake, power outage, or someone tripping over a
power cord will strike next.  It’s nice
to know that your critical workloads are not only safe, but running at high
efficiency.  The ability of VMware Fault
Tolerance technology to provide quick and efficient protection for your
critical workloads makes it a standout in the datacenter.

All information in this post
regarding fut
ure directions and intent are
subject to change or withdrawal without notice and should not be relied on in
making a purchasing decision of VMware's products. The information in this post
is not a legal obligation for VMware to deliver any material, code,
or
functionality. The release and timing of VMware's products remains at VMware's
sole discretion.

Comparing Hardware Virtualization Performance Utilizing VMmark v1.1

Virtualization has just begun to remake the datacenter. One only needs to look at the rapid pace of innovation to know that we are in the midst of a revolution. This is true not only for virtualization software, but also for the underlying hardware. A perfect example of this is new hardware support for virtualized page tables provided by both Intel’s Extended Page Tables (EPT) and AMD’s Rapid Virtualization Indexing (RVI). In general, these features reduce virtualization overhead and improve performance. A previous paper showed how RVI performs with data for a range of individual workloads. As a follow-on, we decided to measure the effects of RVI in a heterogeneous environment using VMmark, the tile-based mixed-workload consolidation benchmark from VMware®.

VMware ESX has the following three modes of operation: software virtualization (Binary Translation, abbreviated as BT), hardware support for CPU virtualization (abbreviated in AMD systems as AMD-V), and hardware support for both CPU and MMU virtualization utilizing AMD-V and RVI (abbreviated as AMD-V + RVI). For most workloads, VMware recommends that users let ESX automatically determine if a virtual machine should use hardware support, but it can also be valuable to determine the optimal settings as a sanity check.

Environment Configuration:

System under Test

Dell PowerEdge 2970

CPU

2 x Quad-Core AMD Opteron 8384 (2.5GHz)

Memory

64GB DDR2 Reg ECC

Hypervisor

VMware ESX (build 127430)

Application

VMmark v1.1

Virtual Hardware (per tile)

10 vCPUs, 5GB memory, 62GB disk

 ·         AMD RVI works in conjunction with AMD-V technology, which is a set of hardware extensions to the x86 system architecture designed to improve efficiency and reduce the performance overhead of software-based virtualization solutions.  For more information on AMD virtualization technologies see here. 

·         VMmark is a benchmark intended to measure the performance of virtualization environments in an effort to allow customers to compare platforms.  It is also useful in studying the effect of architectural features. VMmark consists of six workloads (Web, File, Database, Java, Mail and Standby servers). Multiple sets of workloads (tiles) can be added to scale the benchmark load to match the underlying hardware resources. For more information on VMmark see here.

Test Methodology

By default, ESX automatically runs 32bit VMs (Mail, File, and Standby) with BT, and runs 64bit VMS (Database, Web, and Java) with AMD-V + RVI.  For these tests, we first ran the benchmark using the default configuration and determined the number of tiles it would take to saturate the CPU resources.  All subsequent benchmark tests used this same load level. We next measured the baseline benchmark score with all VMs under test except Standby configured to use BT (i.e., no hardware virtualization features). A series of benchmark tests was then executed while varying the hardware virtualization settings for different workloads to assess their effects in a heavily-utilized mixed-workload environment. All of the results presented are relative to the baseline score and illustrate the percentage performance gains achieved over the BT-only configuration.

We began by setting the Standby servers to use both AMD-V + RVI.  We then stepped through each of the available workloads and altered the CPU/MMU hardware virtualization settings for that specific workload type.  After determining which setting was best (BT, AMD-V, or AMD-V + RVI) we used that setting for the subsequent tests.

Results


The test results summarized in Table 1 are both interesting and insightful. ESX’s efficient utilization of AMD-V + RVI for each workload highlights a leap forward in virtualization platform performance. Remember that once we determined AMD-V + RVI to be the best for a workload, we continued to use that setting for that workload during all subsequent tests unless otherwise noted. For example in the AMD-V File run below, the Web server VMs were set to AMD-V + RVI, File server VMs were set to use just AMD-V, and all other non-Standby servers were set to BT.

Vroom-RVI-2   Click on graph to enlarge

By taking advantage of hardware-assist features in the processor, ESX is able to achieve significant performance gains over using software-only virtualization. The default or “out of the box” settings produced good results, and further tuning for this particular set of workloads yielded additional performance gains of nearly 6% for our SUT. 

It should be noted that these performance gains may or may not be true for dissimilar workload, but for this configuration the improvement made by utilizing an all AMD-V and RVI enabled environment was very impressive. In addition, older processor versions with different cache sizes, clock rates, etc. may produce different results.

It’s probably safe to say that hardware technologies seem to be trending to continued improvements for virtualized environments.  ESX’s ability to provide proficient deployment of the latest and greatest hardware innovation, combined with its flexibility in allowing users to run different workloads with different levels of hardware assist is what truly sets it apart.    

All information in this post regarding future directions and intent are subject to change or withdrawal without notice and should not be relied on in making a purchasing decision of VMware's products. The information in this post is not a legal obligation for VMware to deliver any material, code, or functionality. The release and timing of VMware's products remains at VMware's sole discretion.