Home > Blogs > VMware VROOM! Blog > Monthly Archives: July 2009

Monthly Archives: July 2009

Exchange Performs Well Using Fibre Channel, iSCSI, and NFS on vSphere

In a new whitepaper, a large installation of 16,000 Exchange users was configured across eight virtual machines (VMs) on a single VMware vSphere 4 server.  The storage used for the test was a NetApp FAS6030 array that supported Fibre Channel, iSCSI, and NFS storage protocols.  This allowed for a fair comparison of these three storage protocols on the same hardware.  The test results show that each protocol achieved great performance with Fibre Channel leading the way, and with iSCSI and NFS following closely behind.

 

Similar tests have been done to compare Fibre Channel, iSCSI, and NFS on ESX in the past.  These tests used IOmeter to measure the storage performance.  In this new round of tests, Exchange Load Generator was used as the test tool to simulate an 8-hour work day.

 

The results show that Fibre Channel provided the best performance with the lowest CPU utilization.  Additionally, iSCSI and NFS were relatively close in performance.  The two graphs below summarize the test results showing the sendmail average latency as reported by LoadGen and the overall CPU utilization of the the ESX server.

 

CpuUtilizationGraph

 

SendMailAvgLatencygraph

 

The complete whitepaper has all of the configuration details and additional test results.

 

Using a Private Cloud to Improve Exchange Performance

There has been quite a bit of talk about Clouds lately.  VMware has used the terms private cloud and external cloud to distinguish between two different types of clouds.  Although you may not realize it, if you are an existing VMware customer you might be running a private or internal cloud right now.  By running a number of ESX servers in a cluster with vSphere features like VMware High Availability (HA), Distributed Resource Scheduling (DRS or load balancing), and quick provisioning with clones and templates there is in fact an actual cloud in your datacenter right now (even if your datacenter is more like a data closet – it still counts).  Our recent testing of a simulated full business day across four time zones showed up to an 18% performance advantage with Exchange 2007 on a private cloud.

Putting Exchange in the Cloud

A couple of years ago I did some testing with Exchange Server 2007 and VMotion  and presented the results at VMworld 2007.  The findings from that test showed that VMs could be moved with VMotion with minimal impact to performance.   VMotion is one of the enabling features for DRS within a vSphere cluster, so it made sense to test Exchange in that internal cloud environment and measure the performance.

Private Cloud Configuration

A simple but powerful private cloud was created, in the Dell TechCenter lab, with two Dell M905 blades and some EqualLogic PS5000 iSCSI storage arrays.  Each server had four AMD Opteron 8356 quad-core processors and 96 GB of RAM.  Four EqualLogic PS5000XV arrays with 16 15K RPM disks were used to host the Exchange mailbox databases and logs.  An EqualLogic PS5000E with 16 7200 RPM SATA disks was used to host the VM OS partitions. 

Test Scenario

A 16,000 user company in the US has its employees evenly spread across four time zones: East, Central, Mountain, and Pacific.  There are two Exchange mailbox server VMs per time zone, with each VM supporting 2000 users.  Each VM had 4vCPUs and 14 GB of RAM. As each group of users starts their eight hour workday, load increases on the Exchanges servers.  Also as each group of users ends their day, load decreases.  This scenario was simulated by scripting Microsoft Exchange LoadGenerator (LoadGen) to start each group of users 1 hour apart.  The chart below shows the IO Operations per Second (IOPS) during the test and has the points where users start and stop their activity markedIOPsTimeZonesChart

The test was run once with no DRS and once with DRS enabled.  The starting VM placement was the same in both cases with the same four VMs on each server.  The VMs were placed “in order” on the servers with the East and Central VMs on the first server, and the Mountain and Pacific VMs on the second server.

The first graph below shows the CPU utilization of the ESX Servers during the test without DRS.  The load is uneven between the two servers at the beginning and end of the test when load is changing, but is even during the middle of the test when all users are active.

CPUwithoutDRSgraph

Adding the Exchange VMs into a DRS enabled cloud allows for the load across the servers to be kept more even over the course of the test.  The next graph shows the CPU utilization of the ESX hosts while DRS was enabled.  Additionally the VMotion events that were initiated by DRS are called out by the vertical lines.  Each VMotion event was in response to a divergence of the two server’s CPU utilization levels.

CPUwithDRSgraph

The result was that performance was better by as much as 18% for some users and an average of 8% for all  users with their Exchange VMs running in the internal cloud.  The table below shows the complete results in terms of LoadGen Send Mail 95thPercentile latency.

Resultschartimage

Exchange LoadGen only reports performance at the end of the run, but the advantage provided by the test with DRS is due to the advantage gained by keeping the initial CPU spikes lower when new users logon at the beginning of their workday.  There is some variation from group to group as a result of the noise of the Exchange workload.  In seven of the eight user groups latency was better when DRS was enabled showing a clear advantage that is not due to a single group having a big advantage.

Conclusion

The CPUs on the two servers with all 16,000 users running are under 50%, meaning that this workload is not CPU constrained.  In cases where CPUs are oversubscribed and running at near 100% utilizations it is easy to see how DRS can achieve performance gains by moving VMs to servers where there is more available CPU cycles. There is also a small cost in performance each time a VMotion occurs which would negatively affect the DRS enabled test.  Despite these factors, there was still up to an 18% advantage for a given group of users and an average of an 8% advantage for the vSphere private cloud. 

Additionally, simply measuring the performance advantage of DRS does not take into account all of the other advantages of running Exchange on a vSphere cloud.  Potential savings in power by using Distrubuted Power Management (DPM), time savings with easy deployment using templates and clones, and increased availability with HA and Fault Tolerant features are all things that could be considerd as advantages for running in the cloud. 

Application Performance Improvement with DRS

Summary

VMware Distributed Resource Scheduler (DRS), a unique
feature of VMware vSphere 4, dynamically allocates and balances computing
resources in a cluster of vSphere hosts. DRS continuously monitors utilization
across resource pools and intelligently allocates available resources among
virtual machines based on policies specified by administrators.  DRS provides not only most efficient resource
management but can also provide significant VM performance gains. In the
experiments conducted at EMC labs, we observed VM performance improvements as
high as 47% with DRS when running virtualized SQL databases on a cluster of vSphere
hosts.


Why DRS?

VMware vSphere provides a virtual platform to consolidate
many servers onto fewer physical hosts. However, in such consolidation
scenarios, unexpected spikes in the resource demands of the VMs can cause the total
resource requirements to exceed the available resources on their host. A manual
approach to mitigate this problem is to estimate the individual as well as
aggregate resource demands upfront and place the VMs intelligently on the hosts
based on the estimation.

But, even if the hosts are balanced initially, there is no
guarantee that the resource demands of the VMs will stay constant and system
loads on ESX will remain balanced. A change in workload behavior may cause
resource demand to change, which can lead to contention among VMs for CPU
cycles on some hosts while CPU cycles remain unused on other lightly loaded hosts.

DRS provides an automated mechanism to manage the resource
demands. It monitors the resource needs of the VMs at the time of their power
on and allocates the resources by placing the VMs on the appropriate hosts in the
cluster. If the resource demands change after the initial placement, DRS
automatically relocates the VMs to hosts where the resources are readily
available. VMs will continue to get the resources their workloads demand and
thus deliver the same performance they would have if they were running on dedicated
hardware.

Methodology

We created a DRS cluster consisting of 4 ESX hosts. All hosts
were identical in hardware configuration (refer to "Configuration Details" for
more information). On each host we created 4 VMs, and in each VM we installed SQL
Server 2005 and DVD Store version 2.0 (DS2) database.

We created 2 DS2 workload profiles:

  • Heavy:
    This profile drove vCPU utilization in a VM to 70%
  • Light:
    This profile drove vCPU utilization in a VM to 10%

We randomly assigned these profiles to the VMs as shown in
Table 1.Though the number of VMs was the same on all hosts, differences in the
application load led to CPU resource contention in some hosts and unused CPU resources
on the remaining hosts.

Table. 1 VM Workload Profiles

Host 1 Host 2
Host 3
Host 4
4H 4H 4L 4L
4H 3H / 1L 1H / 3L 4L
4H 2H / 2L 2H / 2L 4L
4H 2H / 2L 1H / 3L 1H / 3L
3H / 1L 3H / 1L 2H / 2L 4L
3H / 1L 3H / 1L 1H / 3L 1H / 3L
3H / 1L 2H / 2L 2H / 2L 1H / 3L

H – VM with 70% CPU utilization; L – VM with 10% CPU utilization

For each test case in Table 1, we ran a DS2 workload simultaneously in all VMs with DRS disabled. We collected the application throughput (Orders per Minute or OPM) in all the VMs

We repeated the experiments after enabling DRS. During each test case, DRS migrated a few VMs based on the resource demands across the hosts. The final balanced configuration achieved in each case was same and is given in Table 2. We measured the aggregate throughput from all the VMs in this balanced configuration.

Table 2. Balanced DRS cluster

Host 1
Host 2
Host 3
Host 4
2H / 2L 2H / 2L 2H / 2L 2H / 2L


Key Findings

Figure. 1 compares the aggregate throughput of all
heavily loaded VMs in each of our test case (Table 1) with and without DRS. We
have not shown the performance of lightly loaded VMs as there was no change in their
performance.

Figure 1. Performance Gains with DRS

DRS
With DRS we observed:

  • 15 ~
    47% gains in aggregate performance for the cases tested.
  • That
    the higher the resource demand and imbalance in the cluster, the higher
    the performance gain
    .
  • No performance impact when the cluster was already balanced

This testing was the result of a joint effort
between VMware and EMC. We would like to thank the Midrange Partner Solutions
Engineering team at EMC,
Santa Clara for providing access to the hardware, for the use of their lab, and for their
joint collaboration throughout this project
.

Configuration Details

ESX Hosts (4)
HP DL380
4 Dual socket, Quad core Intel Xeon 5450 3.0GHz
32GB of Memory
Dual port QLogic QLE2462 HBA

VC Server (1)
HP DL380
4 Dual socket, Quad core Intel Xeon 5450 3.0GHz
8GB of Memory

Load Generators (4)
Dual socket, Dual core server
8GB of Memory

Storage (1)
CX 4-960 with 188 15K rpm FC disks

Virtual Platform:

VMware vSphere

Virtual Machines (16)
4 virtual CPUs
5GB memory
Windows Server 2003 x64 with SP2
SQL Server 2005 x64 with SP2
DVD Store version 2 (Large sized database)
http://www.delltechcenter.com/page/DVD+Store

Tuning
DRS aggressiveness threshold: 5 (most aggressive)

For more comments or questions, please join us in the VMware Performance Community website.

About the Authors:
Chethan Kumar is a member of Performance
Engineering team at VMware. Radhakrishnan Manga is a member of Midrange
Partner Solutions Engineering team at EMC.