One of the great new features of vSphere is VMware Fault Tolerance (FT) which allows a VM to be in lockstep on two different physical servers at the same time. This provides for a high availability option which has virtually no downtime. A whitepaper focused on FT was recently published along with a blog post that has the complete details about this great new technology. Using an Exchange Server 2007 mailbox VM, we did some tests to measure the performance of up to 2000 users with FT.
In order to examine the performance of an FT VM running Exchange Server 2007, a series of tests were run with 1000, 1500, and 2000 users. Performance was measured in terms of CPU utilization and Sendmail response time for the same VM both with and without FT enabled. The results were used to measure the performance impact of using FT as well as the number of users that can be supported by a 1 vCPU VM. (Today FT is supported on 1 vCPU VMs).
I worked with the Dell TechCenter team and used two of their Dell PowerEdge blade servers with Intel Nehalem-based Xeon 5500 processors. The primary server was an M710 with two Intel Xeon X5570 processors running at 2.93GHz and 72GB of RAM. The secondary server was an M610 with the same type of processors, but with 48GB of RAM. The terms primary and secondary refer to the portions of the fault tolerant VMs that the servers hosted during the tests.
Both blade servers were in the same chassis, so all FT logging traffic remained local in the chassis Ethernet switch. The servers connected via iSCSI to EqualLogic PS5000XV storage arrays where the OS, data, and log LUNs for the VMs were stored.
The servers were installed with ESX 4.0 and managed by a vCenter Server. VMs were created with 1 vCPU and 10GB of RAM, installed with Windows Server 2008 x64 and Exchange Server 2007 Mailbox role. Another VM that acted as the domain controller and Hub Transport and Client Access server was on a third blade server in the same chassis. Microsoft Exchange Load Generator (LoadGen) was used with the Heavy Online user profile to simulate an eight hour workday.
Fault Tolerant Test Results
The testing showed that the performance of the Exchange VM was affected only slightly when FT was used. Sendmail average latency increased by 10 to 13 milliseconds, and 95th percentile avgerage latency increased by 33 to 45 milliseconds. All test results were under the 1000ms threshold at which user experience starts to degrade. These results indicate that, even at 2000 users, the performance of Exchange on a 1 vCPU VM was acceptable with or without FT.
The CPU utilization results for the overall system show a low impact of using FT. Because the Exchange VM was the only one on the ESX server, overall system utilization was very low with a peak of just over 7% in the most stressful test. Enabling FT only caused an additional 1 to 1.5% of system CPU to be used. The utilization of the ESX host with the secondary VM was slightly lower than the primary. When examining the CPU utilization of the 1 vCPU VM, the utilization average reaches just under 45%. This is a comfortable level that still leaves room for the bursty nature of Exchange.
Enabling FT for an Exchange VM running on the latest server hardware shows good performance for up to the 2000 users tested, and the effect of FT on the workload was relatively small. These results show that an Exchange VM can be a good candidate for using FT to enable increased uptime and availability.