16,000 Exchange Mailboxes, 1 Server

We recently finished a large Exchange 2007 capacity test on VMware
ESX Server 3.5. How large? Well, larger than anything ever done before on a
single server. And we did it from start to finish in about two weeks.

We did this test because we have felt for a while that
advances in processor and server technology were about to leave another
widely-used and important application unable to fully utilize the hardware that
vendors were offering. Microsoft has guidelines on what environment works well
with Exchange, and a system with more than eight CPUs and/or 32GB of RAM is beyond
the recommended maximums.

Hardware vendors are now offering commodity servers with 16 cores (quad socket with four cores each) and enough memory slots to hold 256GB of RAM. Within a year or two we would expect this to go up even further, with commodity x86 systems being built with 32 cores. Microsoft Exchange deployments
typically work well with the ‘scale out’ model, but that causes server proliferation and underutilized hardware, especially as systems get this large.  VMware ESX Server allows us to make more effective use of the hardware and improve capacity.

Using VMware ESX Server 3i version 3.5 we created eight virtual machines, each with two vCPUs and 14GB of memory, and configured 2,000 mailboxes on each one.  We chose 2,000 users based on Microsoft’s recommendation of 1,000 mailboxes per core and we selected 14GB of memory in accordance with the recommendation to use 4GB + 5MB/mailbox. We used the hardware recommendations for Exchange Server
in Multi-Role configuration because each virtual machine was running the Hub, CAS, and UM components in addition to hosting the mailboxes.

We ran this test on an IBM x3850 M2 server with 128GB of RAM. The virtual machines ran Microsoft Windows Server 2003 R2 Datacenter x64 Edition with Service Pack 2 and Microsoft Exchange 2007 Server Version 8 with Service Pack 1.

The storage used for these tests was an EMC CX3-40 with 225 disks (15 drawers of 15 disks each). Each virtual machine was configured to use two LUNs of 10 disks each for the Exchange database and a three-disk LUN for logs.

We used the Microsoft Load Generator (LoadGen) tool to drive the load on the mailboxes, and ran with the heavy user profile.  Here are the LoadGen settings:

  • Simulated day – 8 hours
  • Test run – 8 hours
  • Stress mode – disabled
  • No distribution lists or dynamic distribution lists for internal messages
  • No contacts for outgoing messages
  • No external outbound SMTP mail
  • Profile used: Outlook 2007 Online, Heavy, with Pre-Test Logon

We ran the tests using both ESX Server 3.5 and ESX Server 3i version 3.5 and the performance was the same across both versions. Tests were run with one through eight virtual machines, and even in the eight virtual machine case about half the CPU resources were still available.

Disk latencies were around 6ms across our runs. The IOPS rate started off at about .65 IOPS/mailbox in the first hour but stabilized at .37 IOPS/mailbox in the last hour (once the cache was warmed up). Over the duration of the run the average rate was .45 IOPS/mailbox.  The read/write ratio observed was approximately 4:1.

Sendmail latency is an important measure of the responsiveness of the Exchange Server. Figure 1 shows how it changed as more virtual machines were added to the system.


Figure 1. Sendmail Latency

A 1000ms response time is considered the threshold at which user experience starts to degrade. As can be seen from the 95th percentile response times in Figure 1, there’s still a significant amount of headroom on
this server, even at our highest tested load level.

These tests ran smoothly and demonstrated what we expected. This should come as no
surprise. As new hardware becomes available, the scalability of ESX Server allows us to easily make productive use of the additional capacity.

It took many hours and creative hardware "repurposing" from our lab personnel to put this setup together within a couple of days, and it’ll probably take them even longer to get everything back to its original place.  I’d like to acknowledge that without their efforts, we wouldn’t have been able to get this done.


The large number of companies already running Microsoft Exchange Server on VMware ESX Server are experiencing improved resource utilization and better manageability as well as lower space, power, and cooling costs. New servers with greater processing power make the transition to Exchange on ESX Server even more compelling.


19 comments have been added so far

  1. So, what happens when you do turn on some of the other LoadGen settings (No distribution lists or dynamic distribution lists for internal messages, No contacts for outgoing messages, No external outbound SMTP mail) or enable Stress Mode?

  2. … and another thing, what about SQL Server 2005? In our ESX Server 3.0.2 + VC 2.0.2 environment on IBM x3850 hosts connected to a CX700, SQL Server 2005 32-bit or 64-bit does not seem to perform very well.

  3. 1. Trying this out with Domino is definitely something we would like to do. Hopefully we’ll be able to put something together soon.
    2. We ran with what are considered “normal” settings for Loadgen. One setting that we have seen other tests use is the “cached” mode instead of “online”, that makes the test lighter.
    3. We expect SQLServer to run very well in virtual machines running on ESX. Please work with our support organization and/or your account team if you are facing issues with this environment and we can help understand what might be happening.
    4. This test was run using FC, mostly because that is the storage we could put together quickly based on availability. We hope to try similar tests with iSCSI soon.

  4. Is there a whitepaper or more data from this testing ? I would like to see the RPC latencies, and more configuration details (network etc)

  5. It would be great if you share the screen shot which shows it has half of the CPU resources available. We would be interested to see the disk latency. Which tools you have used to measure that or used windows perfmon counter? What was MBR/Partition table for the vmdk file ? How did you set lun for the host? How was lun created on SAN side? What was block size on the LUN? When you say each VM was configured for two lun what exactly you mean by that ?

  6. Saw Scott Drummonds & Dave Korsunsky present this as VMWorld Europe in Feb. For me it was one of the stand out sessions of the event.
    Would like to know if the performance still holds up when bigger building blocks are used (4 vCPU, 22GB RAM). This would keep the licensing costs for Exchange closer to that of a native installation, and therefore make the proposition even more compelling.

  7. What about Edge Transport Servers?
    Do you put some in this test?
    If not, it is able to virtualize Edge Transport Servers?

  8. Hey – for those looking for more detailed docs (as well as info on storage configuration and config of virtual edge/transport/cas roles) – a followup was completed jointly after VMworld EMEA. I just posted the detailed Applied Tech Guide on my blog here: http://virtualgeek.typepad.com
    I also thought Scott/Dave’s presentation on the topic was one of the keys of the show. Also want to call out an unsung hero – Derrick Baxter, who did a lot of the heavy lifting to produce the data everyone has been using.

  9. One other quick comment (a followup to Kaushik’s excellent answer) – the test was on FC, but we have done the exact same thing with iSCSI, and got the same results, except for during the streaming backup part of the loadgen test sequence. The normal Exchange workload period (simulating the Outlook Client workload) is IOPs bound, not throughput bound, so FC/iSCSI is a wash. It does make a difference though (unless you apply a LOT of 1GbE iSCSI connections) during the backup tests, which are throughput bound.

  10. In your pres at VMworld Europe, you mentioned Msft’s recommended maximums were 1,000 avg mailboxes or 500 heavy mailboxes per CPU core. In your paper here, you stated that you went with their guidelines of 1,000 heavy user mailboxes. Is one of those incorrect?

  11. Re: aenagy, I have run SQL in our VM env for some time infact 5 seperate instances of SQL 2005 running Production services all just like it’s physical and no RDM’s

  12. BudgetMailboxes.com is an Authorized Internet Retailer for Architectural Mailboxes, Auth-Florence, Blomus Mailboxes, Bluegrass Woods, Inc, Bobi, Brandon Industries, Curbvault Mailboxes, Dahlhaus, Dvault, Ecco Mailboxes, Gaines Manufacturing, Good Directions, House Art, Knobloch, Mayne & Salsbury, Mayne & Whitehall, Mayne Post, New England Woodworks, QualArc, Salsbury Industries, Streetscape Inc Mailboxes, Still River Designs, Whitehall Products, Mail Keeper— and every other brand we offer.

Leave a Reply

Your email address will not be published.