Home > Blogs > VMware VROOM! Blog > Monthly Archives: August 2009

Monthly Archives: August 2009

Comparing Fault Tolerance Performance & Overhead Utilizing VMmark v1.1.1

VMware Fault
Tolerance (FT), based on vLockstep technology and available with VMware
vSphere, easily and efficiently provides zero downtime and zero data loss for
your critical workloads. FT provides continuous availability in the event of
server failures by creating a live shadow instance of the primary virtual
machine on a secondary system.  The
shadow VM (or secondary VM), running on the secondary system, executes sequences
of x86 instructions identical to the primary VM, with which it proceeds in
vLockstep.  By doing so, if catastrophic
failure of the primary system occurs it causes an instantaneous failover to the
secondary VM that would be virtually indistinguishable to the end user. While
FT technology is certainly compelling, some potential users express concern
about possible performance overhead. In this article, we explore the
performance implications of running FT in realistic scenarios by measuring an
FT-enabled environment based on the heterogeneous workloads found in VMmark, the tile-based
mixed-workload consolidation benchmark from VMware®.

Figure 1 : High Level Architecture of
VMwar
e Fault Tolerance

Pic1

Environment Configuration :

System under Test

2 x Dell PowerEdge R905

CPUs

4 Quad-Core AMD Opteron 8382
(2.6GHz)

4 Quad-Core AMD Opteron 8384
(2.7GHz)

Memory

128GB DDR2 Reg ECC

Storage Array

EMC CX380

Hypervisor

VMware ESX 4.0

Application

VMmark v1.1.1

Virtual Hardware (per tile)

8 vCPUs, 5GB memory, 62GB disk

  •  VMware Fault Tolerance currently
    only supports 1 vCPU VMs and requires specific processors for enablement; for
    the purposes of our experimentation our VMmark Database and MailServer VMs were
    set to run with 1vCPU only.  For more
    information on FT and its requirements see
    here.
  • VMmark
    is a benchmark intended to measure the performance of virtualization environments
    in an effort to allow customers to compare platforms.  It is also useful in studying the effect of
    architectural features. VMmark consists of six workloads (Web, File, Database,
    Java, Mail and Standby servers). Multiple sets of workloads (tiles) can be added
    to scale the benchmark load to match the underlying hardware resources. For
    more information on VMmark see
    here.


Test Methodology :

An
initial performance baseline was established by running VMmark from 1 to 13
tiles on the primary system with Fault Tolerance disabled for all workloads. FT
was then enabled for the MailServer and Database workloads after customer
feedback suggested they were the applications most likely to be protected by FT.
The performance tests were then executed a second time and compared to the
baseline performance data.

 

Results
:

The
results in Table 1 are enlightening as to the performance and efficiency of
VMware’s Fault Tolerance.  For this case,
“FT-enabled Secondary %CPU”, indicates the total CPU utilized by the secondary
system under test.  It should also be
noted that, for our workload, the default ESX 4.0, High Availability, and Fault
Tolerance settings were used and these results should be considered ‘out of the
box’ performance for this configuration. 
Finally, the secondary system’s %CPU is much lower by comparison to the
primary system because it is only running the MailServer and Database
workloads, as opposed to the six workloads that are being run on the primary
system.

Table 1:

Pic2b  

You can see that as we scaled
both configurations toward saturation the overhead of enabling VMware Fault
Tolerance remains surprisingly consistent, with an average delta in %CPU used
of 7.89% over all of the runs.  ESX was
also able to achieve very comparable scaling for both FT-enabled and FT-disabled
configurations.  It isn’t until the FT-enabled
configuration nears complete saturation, a scenario most end users will never
see, that we start to see any real discernable delta in scores.

It should be noted that these
performance and overhead statements may or may not be true for dissimilar
workloads and systems under test.  From
the results of our testing you can see that the advantage of having Mail
servers and Database servers truly protected, without fear of end-user
interruption, is completely justified.

It’s a tough world out there; you
never know when the next earthquake, power outage, or someone tripping over a
power cord will strike next.  It’s nice
to know that your critical workloads are not only safe, but running at high
efficiency.  The ability of VMware Fault
Tolerance technology to provide quick and efficient protection for your
critical workloads makes it a standout in the datacenter.

All information in this post
regarding fut
ure directions and intent are
subject to change or withdrawal without notice and should not be relied on in
making a purchasing decision of VMware's products. The information in this post
is not a legal obligation for VMware to deliver any material, code,
or
functionality. The release and timing of VMware's products remains at VMware's
sole discretion.

Performance of Exchange Server 2007 in a Fault Tolerant Virtual Machine

One of the great new features of vSphere is VMware Fault Tolerance (FT) which allows a VM to be in lockstep on two different physical servers at the same time.  This provides for a high availability option which has virtually no downtime.   A whitepaper focused on FT was recently published along with a blog post that has the complete details about this great new technology.  Using an Exchange Server 2007 mailbox VM, we did some tests to measure the performance of up to 2000 users with FT.

In order to examine the performance of an FT VM running Exchange Server 2007, a series of tests were run with 1000, 1500, and 2000 users.  Performance was measured in terms of CPU utilization and Sendmail response time for the same VM both with and without FT enabled.  The results were used to measure the performance impact of using FT as well as the number of users that can be supported by a 1 vCPU VM. (Today FT is supported on 1 vCPU VMs).

Test Configuration

I worked with the Dell TechCenter team and used two of their Dell PowerEdge blade servers with Intel Nehalem-based Xeon 5500 processors.  The primary server was an M710 with two Intel Xeon X5570 processors running at 2.93GHz and 72GB of RAM.  The secondary server was an M610 with the same type of processors, but with 48GB of RAM.  The terms primary and secondary refer to the portions of the fault tolerant VMs that the servers hosted during the tests.

Both blade servers were in the same chassis, so all FT logging traffic remained local in the chassis Ethernet switch. The servers connected via iSCSI to EqualLogic PS5000XV storage arrays where the OS, data, and log LUNs for the VMs were stored.

The servers were installed with ESX 4.0 and managed by a vCenter Server.  VMs were created with 1 vCPU and 10GB of RAM, installed with Windows Server 2008 x64 and Exchange Server 2007 Mailbox role.  Another VM that acted as the domain controller and Hub Transport and Client Access server was on a third blade server in the same chassis.  Microsoft Exchange Load Generator (LoadGen) was used with the Heavy Online user profile to simulate an eight hour workday.

Fault Tolerant Test Results

The testing showed that the performance of the Exchange VM was affected only slightly when FT was used. Sendmail average latency increased by 10 to 13 milliseconds, and 95th percentile avgerage latency increased by 33 to 45 milliseconds.  All test results were under the 1000ms threshold at which user experience starts to degrade.  These results indicate that, even at 2000 users, the performance of Exchange on a 1 vCPU VM was acceptable with or without FT.

SendMailLatencyGraphs_withFT

The CPU utilization results for the overall system show a low impact of using FT.  Because the Exchange VM was the only one on the ESX server, overall system utilization was very low with a peak of just over 7% in the most stressful test.  Enabling FT only caused an additional 1 to 1.5% of system CPU to be used.  The utilization of the ESX host with the secondary VM was slightly lower than the primary.  When examining the CPU utilization of the 1 vCPU VM, the utilization average reaches just under 45%.  This is a comfortable level that still leaves room for the bursty nature of Exchange. 

CPU_UtilizationGraphs_FT 

Enabling FT for an Exchange VM running on the latest server hardware shows good performance for up to the 2000 users tested, and the effect of FT on the workload was relatively small.  These results show that an Exchange VM can be a good candidate for using FT to enable increased uptime and availability.

 

VMware vSphere™ 4: The CPU Scheduler in VMware® ESX™ 4

VMware recently published a whitepaper that discusses changes in CPU scheduler in ESX 4. The paper also describes a few key concepts in CPU scheduler that should be useful to understand performance issues involved with CPU scheduler. Specifically, it attempts to answer the following questions:

  • How CPU time is allocated between virtual machines? How well does it work?
  • What is the difference between “strict” and “relaxed” co-scheduling? What is the performance impact of recent co-scheduling improvements?
  • What is the “CPU scheduler cell”? What happened to the scheduler cell in ESX4?
  • How does ESX scheduler exploit the underlying CPU architecture features like multi-core, Hyper-threading, and NUMA?

The following provides brief summary of the paper:

In ESX 4, many improvements have been introduced in CPU scheduler. This includes further relaxed co-scheduling, lower lock-contention, and multi-core aware load balancing. Co-scheduling overhead has been further reduced by the accurate measurement of the co-scheduling skew, and by allowing more scheduling choices. Lower lock-contention is achieved by replacing scheduler cell-lock with finer-grained locks. By eliminating the scheduler-cell, a virtual machine can get higher aggregated cache capacity and memory bandwidth. Lastly, multi-core aware load balancing achieves high CPU utilization while minimizing the cost of migrations.

Experimental results show that the ESX 4 CPU scheduler faithfully allocates CPU resource as specified by users. While maintaining the benefit of a proportional-share algorithm, the improvements in co-scheduling and load-balancing algorithms are shown to benefit performance. Compared to ESX 3.5, ESX 4 significantly improves performance in both lightly loaded and heavily loaded systems.

For more details please download and read our full paper from here.

Virtual Machine Monitor Execution Modes in vSphere 4.0

Recently we published a whitepaper describing the VMware Virtual Machine Monitor (VMM) execution modes in vSphere 4.0. The VMM may choose hardware support for virtualization whenever it's available or may choose software techniques for virtualization when hardware support is unavailable or not enabled on the underlying platform. The method chosen by the VMM for virtualizing the x86 CPU and MMU is known as the "monitor mode".

This paper attempts to familiarize our customers with default monitor modes chosen by the VMware VMM for many popular guests running on modern x86 CPUs. Most workloads perform well under these default settings. In some cases the user may want to override the default monitor mode. We provide a few examples in which the user may observe performance benefits in overriding the default monitor modes and two ways by which the user can override the defaults.

The default monitor mode chosen by the VMM for a particular guest depends on the available (or enabled) hardware features on the underlying platform and the guest OS performance in that mode. The difference in the availability of virtualization support on modern x86 CPUs and the guest OS performance when using those features (or when using software techniques when those features are unavailable) leads to a complex problem of choosing the appropriate monitor mode for a given guest on a given x86 CPU.  For more details please download and read our full paper from here.

VMware Fault Tolerance Performance

If you have been following the virtualization blogosphere
then you may have noticed that last week we published three interesting
performance whitepapers at once. I want to talk about the “VMware vSphere 4
Fault Tolerance: Architecture and Performance”
whitepaper in this blog.

VMware Fault
Tolerance also known as FT is one of the most anticipated features of the
vSphere 4.0 release. FT provides continuous high availability to virtual
machines without down time or disruption in the event of complete host failure
and it is designed to be agnostic to the guest operating system and the
application running in it. With FT you could get hardware style fault tolerance
protection to virtual machines on commodity server systems without the need for
any specialized hardware. All you need is recent processors from Intel and AMD
and a gigabit Ethernet Link for transmitting FT logging traffic. You could
download and use the VMware Site Survey
utility
or check KB article
1008027
to see which servers in your datacenter have FT compatible
processors.

We first demonstrated a prototype version of FT at the VMworld 2007
keynote
, and earlier this year at VMworld Europe 2009 we previewed its performance
characteristics.  The product is now officially
shipped with vSphere 4.0 and  some of you
may be already using the feature and many others are probably planning to use
it in their datacenters very soon. In case if you are wondering how enabling FT
impacts performance we got you covered in the whitepaper. This whitepaper
provides concise but important architectural and performance aspects of
enabling FT backed up with performance data from variety of workloads. We have
been working very hard on enhancing the performance of FT for the last few
years and you’ll be able to see the results in the whitepaper.  The two key takeaways that you will learn
from this paper are that you don’t need lot of networking bandwidth for FT; a
gigabit link is sufficient for the vast majority of workloads.  Also when there is sufficient CPU headroom FT
impacts throughput very little.  For more
details I recommend you to take a look at the whitepaper. 

I also welcome you to my VMworld 2009 session in San Francisco
titled “BC2961:
VMware Fault Tolerance Performance and Architecture”
where I will be
talking more in depth with recent performance numbers. 

VMware vCenter Site Recovery Manager Performance and Best Practices White Paper

VMware vCenter Site Recovery Manager (SRM) is a component of the VMware Infrastructure that accelerates recovery for the virtual environment through automation, ensures reliable recovery by enabling non-disruptive testing, and simplifies recovery by eliminating complex manual recovery steps and centralizing management of recovery plans.

A whitepaper on VMware vCenter Site Recovery Manager Performance and Best Practices is now available here

In this performance paper we discuss VMware vCenter Site Recovery Manager 1.0 performance, various dimensions on which the recovery time depends, high-latency networks, and tips on architecting recovery plans to minimize recovery time.