Home > Blogs > VMware VROOM! Blog > Monthly Archives: May 2010

Monthly Archives: May 2010

Exchange 2010 Disk I/O on vSphere

In the first part of this series on Exchange 2010 on vSphere the focus was scale-up performance of a single Mailbox Server VM.  The results showed performance was great across all points tested, ranging up to 8000 users.  This article will take a look at disk I/O which is an important aspect of Exchange performance.


Enhancements made to Exchange 2010 have resulted in a reduction in disk I/Os per second (IOPS) in comparison to previous versions.  This reduction is reported by Microsoft to be as big as 70% in some cases.  Exchange 2010 makes more efficient use of memory as a cache which results in fewer reads and writes to disk than before.  However sufficient memory is still very important for optimal performance.  Exchange 2010 also uses larger I/O and more sequential I/O to improve performance.  Tests were conducted in the VMware labs to measure how beneficial additional RAM could be to an Exchange 2010 Mailbox Server VM running on vSphere 4. 


Configuration


The same test configuration was used as in the previous blog.  To summarize, a Dell PowerEdge R710 with dual Xeon X5570 quad-core processors and 96 GB of RAM was installed with a development version of vSphere 4 (build 235768). Three Exchange 2010 VMs were created and used for testing: Mailbox Server, Client Access Server (CAS), and Hub Transport Server.   


An EMC CLARiiON CX4-960 provided Fibre Channel based storage with two nine-disk RAID-5 LUNs for data and an eight-disk RAID 1/0 LUN for logs.  Normally, RAID 1/0 would have been used for both data and logs because it provides the highest performance with fault tolerance.  Due to the improvement in Exchange 2010 IOPS performance, it was decided to try RAID-5 for these tests to see if this lower performing RAID type would provide acceptable performance.


Testing


Exchange 2010 can use RAM as a cache for disk I/O.  This reduces the amount of IOPS on the physical disks because more requests for information are satisfied in memory.  If the workload is kept constant, then a VM that is assigned more RAM will have lower IOPS.  As more requests are satisfied by information in memory, the average response time will also decrease. 


The mailbox VM was configured with 4 vCPUs and a range of memory sizes from 16 to 64 GBs.  LoadGen 2010 Beta was used with the Very Heavy Outlook 2007 Online profile with 100 MB mailboxes to simulate 8000 users.  Increasing RAM reduced both IOPS and Average SendMail response time.  The charts below show the results of these tests.


IOPS_AvgRT_2Charts 


The total IOPS is low for 8000 users even at the highest recorded level of 1921 IOPS with 16 GB of RAM.  Increasing RAM to 24 and 32 GB showed a significant decrease in IOPS, but additional gains are much less as more RAM is added.  The performance improvements as shown by the SendMail response time follow the same curve as IOPS with big improvements at 24 and 32 GB.  The reduction in disk I/O is directly related to the improvement in performance.


The next chart shows CPU utilization broken out for the Mailbox Server, Hub Transport Server, and CAS VMs during these same tests.  The only change between tests is the amount of RAM assigned to the Mailbox server VM. 


IOPS_CPUUtilGraph 


CPU utilization of all the roles remained essentially the same across all tests.  The amount of IOPS reduces as RAM is increased, but the amount of users remains the same.  The VMs are doing the same amount of work but with a better response time.  In this case, CPU is not affected by the change and does not reflect any change in performance.  


Conclusion


Assigning more RAM to an Exchange 2010 Mailbox Server VM can result in decreased IOPS and faster response time.  This makes disk requirements for Exchange something that can be changed based on each environment.   The strategy could be to support more users with the same number of disks or use fewer disks to support the same number of users.  Improvements in performance with Exchange 2010 and the lower cost per GB of RAM in current servers can make it a worthwhile exercise.  The flexibility of assigning granular sizes of RAM to VMs makes it easy for an Exchange 2010 VM to be tuned to the correct amount of RAM. 


The next Exchange 2010 on vSphere blog will be on some scale-out performance with multiple Mailbox Server VMs.
 

Surveying Virtualization Performance Trends with VMmark

The trends in published VMmark scores are an ideal illustration of the historical long-term performance gains for virtualized platforms. We began work on what
would become VMmark 1.0 almost five years ago. At the time, ESX 2.5 was the state-of-the-art hypervisor. Today’s standard features such as DRS, DPM, and Storage VMotion were in various prototype and development stages. Processors like the Intel Pentium4 5xx series (Prescott) or the single-core AMD 2yy-series Opterons were the high-end CPUs of choice. Second-generation hardware-assisted virtualization features such as AMD’s Rapid Virtualization Indexing (RVI) and Intel’s Extended Page Tables (EPT) were not yet available. Nevertheless, virtualization’s first wave was allowing customers to squeeze much more value from their existing resources via server consolidation. Exactly how much value was difficult to quantify. Our VMmark odyssey began with the overall goal of
creating a representative and reliable benchmark capable of providing meaningful comparisons between virtualization platforms.

VMmark 1.0 released nearly three years ago after two years of painstaking work and multiple beta releases of the benchmark. The reference architecture for VMmark 1.x is a two-processor Pentium4 (Prescott) server running ESX 3.0. That platform was capable of supporting one VMmark tile (six VMs) and by definition achieved a score of 1.0. (All VMmark results are normalized to this reference score.) The graph below shows a sampling of published two-socket VMmark scores for each successive processor generation. 

Blog_slide_3 ESX 3.0, a vastly more capable hypervisor than ESX 2.5, had arrived by the time of the VMmark 1.0 GA in mid-2007. Greatly improved CPU designs were also available. Two processors commonly in use by that time were the dual-core Xeon 51xx series and the quad-core Xeon 53xx series. ESX 3.5 was released with a number of performance improvements such as TCP Segmentation Offloading (TSO) support for networking in the same timeframe as the Xeon 54xx. Both ESX 4.0 and Intel 55xx (Nehalem) CPUs became available in early 2009. ESX 4.0 was a major new release with a broad array of performance enhancements and supported new hardware feature such as EPT and simultaneous multi-threading (SMT), providing a significant boost in overall performance. The recently released hexa-core Intel 56xx CPUs (Westmere) show excellent scaling compared to their quad-core 55xx brethren. (Overall, ESX delivers excellent scaling and takes advantage increased core-counts on all types of servers.) What is most striking to me in this data is the big picture: the performance of virtualized consolidation workloads as measured by VMmark 1.x has roughly doubled every year for the past five years.

In fact, the performance of virtualized platforms has increased to the point that the focus has shifted away from consolidating lightly-loaded virtual machines on a single server to virtualizing the entire range of workloads (heavy and light) across a dynamic multi-host datacenter. Not only application performance but also infrastructure responsiveness and robustness must be modeled to characterize modern virtualized environments. With this in mind, we are currently developing VMmark 2.0, a much more complex, multi-host successor to VMmark 1.x. We are rapidly approaching a limited beta release of this new benchmark, so stay tuned for more. But in this post, I’d like to look back and remember how far we’ve come with VMmark 1.x. Let’s hope the next five
years are as productive.

Exchange 2010 Scale-Up Performance on vSphere

In order to provide some performance guidance for the new Microsoft Exchange Server 2010 on VMware vSphere 4, a series of tests were run in the VMware labs.  The first set of these tests involved a single Mailbox server VM that was scaled up from 1 vCPU to 8 vCPU with a range of memory sizes.   This blog article will take a look at the results from these tests to understand how Exchange 2010 performs in a VM.  Additional blog posts will look more closely at these 1 VM tests as well as another set of tests that used up to four mailbox VMs to test Exchange 2010 performance in a scale out model.

Configuration

The server used for the testing was a Dell PowerEdge R710 with two Intel Xeon X5570 quad-core processors and 96GB of RAM.  An EMC Clariion CX4-960 storage array was used to provide Fibre Channel based storage.  A development version of VMware vSphere 4 (build 235786) was installed and the VMs were created for the Exchange 2010 testing environment.

A new Windows Server 2008 R2 test domain was created for this testing.  A separate VM was created for the Mailbox, CAS, and Hub Transport Exchange 2010 server roles.  All three VMs were installed with Windows Server 2008 R2 x64 Enterprise Edition, added to the new Exchange 2010 test domain, and had the  necessary prerequisites for Exchange 2010 installed.   Once the Exchange server roles were installed, additional storage was added to the Mailbox Server VM.  A couple of nine-disk RAID 5 LUNs were used for the mailbox databases and an eight-disk RAID 1/0 LUN was used for logs.

Microsoft ExchangeLoad Generator 2010 Beta (aka LoadGen) was used for this testing.  This tool simulates Exchange 2010 users and can be highly customized if so desired.  For these tests, the Very Heavy Outlook 2007 online profile with 100MB mailboxes was used.  The defaults were taken for all other options.  A separate server with Windows Server 2008 was also added to the Exchange 2010 test domain and used to run the LoadGen tests.

Testing

There is a nice sizing spreadsheet that Microsoft has provided for Exchange 2010 that gives a detailed recommendation once you plug in all the needed information.  More simply, Microsoft has provided a basic memory sizing formula for Exchange 2010: 4GB + 3-30 MB per user.  The results of this memory sizing formula reflect that email user types can vary quite a bit, and the amount of RAM needed to support them can also vary quite a bit:

Memory for Exchange 2010 in GB Based on Formula

Users

Low

Middle

High

1000

6.9

18.7

33.3

2000

9.9

33.3

62.6

4000

15.7

62.6

121.2

8000

27.4

121.2

238.4

Testing at various points in this range of memory sizes showed that the Very Heavy LoadGen profile with 100MB mailboxes, used in these tests, falls at the low end of the memory sizing guidelines.    The following chart shows the results from scaling up the Mailbox VM with vCPUs and RAM as the number of LoadGen users was increased.  The response times reported are the average from a 10 hour test run.

Scaleup_AvgSendMailRT 

Performance remains very good as the number of users is increased from 1000 to 8000.  The 95th percentile latency increases less than 150ms to a max of just under 300ms.  There is a slight improvement in performance from 1 vCPU to 2 vCPUs, which is attributed to Exchange not being able to benefit from its multithreaded capabilities with 1 vCPU.

CPU utilization for these tests was low, with plenty of room for spikes in load.  In the 8 vCPU and 4 vCPU cases, with 8000 and 4000 users respectively, there is ample CPU resource available.  In these cases the same amount of load could be run with fewer vCPUs.  The chart below shows the CPU utilization for the Mailbox Server VM as well as the CAS and Hub Transport.

Scaleup_AllRolesCPU 
  
In order to be able to plot the CPU utilization of VMs with different numbers of vCPUs assigned, each vCPU worth of compute power is represented by 100 on the chart.  The chart goes up to 800 because the largest VM used was an 8vCPU VM.  The ratio of CPU usage between the three roles remains the same across all tests with the Mailbox Server VM using about 2x the CAS VM and the CAS Server VM using about 2x the Hub Server VM. With 8000 users only 2vCPUs worth of processing was needed in the 8vCPU VM.

Conclusion

Exchange 2010 performed very well in tests running on VMware vSphere.  In all tests run, the amount of RAM needed was at the low end of Microsoft’s guidance, CPU usage was low, and performance remained good.

Watch for the next blog post based on these tests which will look at the performance of Exchange 2010 disk I/O.