We at VMware have been fielding a lot of inquiries lately from customers who have virtualized (or are considering virtualizing) their Microsoft Lync Server infrastructure on the VMware vSphere platform. The nature of inquiries is centered on certain generalized statements contained in the “Planning a Lync Server 2013 Deployment on Virtual Servers” whitepaper published by the Microsoft Lync Server Product Group. In the referenced document, the writers made the following assertions:
- You should disable hyper-threading on all hosts.
- Disable non-uniform memory access (NUMA) spanning on the hypervisor, as this can reduce guest performance.
- Virtualization also introduces a new layer of configuration and optimization techniques for each guest that must be determined and tested for Lync Server. Many virtualization techniques that can lead to consolidation and optimization for other applications cannot be used with Lync Server. Shared resource techniques, including processor oversubscription, memory over-commitment, and I/O virtualization, cannot be used because of their negative impact on Lync scale and call quality.
- Virtual machine portability—the capability to move a virtual machine guest server from one physical host to another—breaks the inherent availability functionality in Lync Server pools. Moving a guest server while operating is not supported in Lync Server 2013. Lync Server 2013 has a rich set of application-specific failover techniques, including data replication within a pool and between pools. Virtual machine-based failover techniques break these application-specific failover capabilities.
VMware has contacted the writers of this document and requested corrections to (or clarification of) the statements because they do not, to our knowledge, convey known facts and they reflect a fundamental misunderstanding of vSphere features and capabilities. While we await further information from the writers of the referenced document, it has become necessary for us at VMware to publicly provide a direct clarification to our customers who have expressed confusion at the statements above.
RESPONSE HIGHLIGHTS:
- We recommend that customers enable hyper-threading because doing so benefits the ESXi scheduling algorithm and, consequently, the virtualized workloads.
- We recommend that customers enable NUMA. We recommend sizing a VM’s resources to fit within a single NUMA boundary and to only cross boundaries with proper understanding of the physical NUMA topology, and only when absolutely necessary.
- Although we generally recommend against over-provisioning resources for critical workloads, it is possible and easy to over-commit resources within a given vSphere cluster and still ensure adequate resource availability for specific workloads.
- All of vSphere’s High Availability features (vMotion, DRS and vSphere HA) satisfy all of Microsoft’s published requirements for VM portability.
DETAILED RESPONSE:
For the avoidance of any doubt, we are aware that Microsoft fully supports the virtualization of all Microsoft Lync components. See Running Lync Server 2013 on virtual servers, particularly the following statement:
Lync Server 2013 supports virtualization topologies that support all Lync Server workloads, including instant messaging (IM) and presence, conferencing, Enterprise Voice, Monitoring, Archiving, and Persistent Chat.
With regards to the recommendation to disable hyper-threading, the writers did not document the rationale for the recommendation. We will infer that the recommendation is based on the following statement contained in the “Hyperthreading” section of the Understanding Processor Configurations and Exchange Performance Guide published by the Microsoft Exchange Server Product team.
Hyperthreading causes capacity planning and monitoring challenges, and as a result, the expected gain in CPU overhead is likely not justified. Hyperthreading should be disabled by default for production Exchange servers and only enabled if absolutely necessary as a temporary measure to increase CPU capacity until additional hardware can be obtained.
We wish to draw the readers’ attention to the fact that the statement above does NOT imply the existence of ANY technical drawback to enabling hyper-threading for a virtualized Microsoft Lync Server workload. Instead, the concern is about capacity planning and monitoring. We share this same concern – this is why we always recommend that our customers size their critical application environment based on the physical processor cores available, and not to the logical cores exposed by hyper-threading.
The most alarming-sounding argument against enabling hyper-threading when virtualizing a Microsoft application came from the Exchange Server Product group, in the “Hyperthreading: Wow, free processors!” section of the Ask the Perf Guy: Sizing Exchange 2013 DeploymentsTechNet entry
Turn it off. While modern implementations of simultaneous multithreading (SMT), also known as hyperthreading, can absolutely improve CPU throughput for most applications, the benefits to Exchange 2013 do not outweigh the negative impacts….This significant increase in memory, along with an analysis of the actual CPU throughput increase for Exchange 2013 workloads in internal lab tests has led us to a best practice recommendation that hyperthreading should be disabled for all Exchange 2013 servers. The benefits don’t outweigh the negative impact.
The above statement is persuasive, but it is irrelevant to the vSphere virtualization platform and the writer of the TechNet entry was good enough to acknowledge that and accurately clarify the statement thus:
There’s an important caveat to this recommendation for customers who are virtualizing Exchange. Since the number of logical processors visible to a virtual machine is determined by the number of virtual CPUs allocated in the virtual machine configuration, hyperthreading will not have the same impact on memory utilization described above. It’s certainly acceptable to enable hyperthreading on physical hardware that is hosting Exchange virtual machines, but make sure that any capacity planning calculations for that hardware are based purely on physical CPUs…… –Jeff Mealiffe, Principal Program Manager Lead, Exchange Customer Experience
We highly recommend that customers ignore this recommendation to disable hyper-threading for virtualized Microsoft Lync workload. We have documented the performance benefits that we derive from hyper-threading in the “Hyper-threading” section of our Performance Best Practices for VMware vSphere® 5.5 Guide
Similarly, there is no disputing the fact that the Microsoft Windows Operating System is sufficiently modern and advanced to recognize and leverage the benefits of the Non-Uniform Memory Access optimization techniques of modern processor hardware. The writers of the referenced document did not advance any technical rationale for the following recommendation:
Disable non-uniform memory access (NUMA) spanning on the hypervisor, as this can reduce guest performance.
The most relevant authoritative source for NUMA discussion that we could find on Microsoft’s website is the Best Practices for Virtualizing and Managing Exchange 2013 Guide which, incidentally, has the following favorable statements regarding the benefits of NUMA:
….In addition, more advanced performance features, such as in-guest Non-Uniform Memory Access (NUMA), are supported by Windows Server 2012 Hyper-V virtual machines. Providing these enhancements helps to ensure that customers can achieve the highest levels of scalability, performance, and density for their mission-critical workloads….NUMA is a memory design architecture that delivers significant advantages over the single system bus architecture and provides a scalable solution to memory access problems. – Page 11
Although Exchange 2013 is not NUMA-aware, it takes advantage of the Windows scheduler algorithms that keep threads isolated to particular NUMA nodes; however, Exchange 2013 does not use NUMA topology information…. -Page 49
The Windows Operating System is NUMA-aware, and the presence of NUMA-capabilities in the OS has not been demonstrated to hurt any of the Lync Server components in any of our tests or at any of our customers. The document under discussion does not contain any fact alluding to such incompatibility. The Non-Uniform Memory Access (NUMA) section of our Performance Best Practices for VMware vSphere® 5.5 Guide contains our rationale for recommending that our customers enable NUMA in their vSphere environment for their virtualized business critical applications. In the absence of a proven known incompatibility with NUMA, we continue to prescribe this recommendation to customers looking to improve performance for their Microsoft Lync Servers hosted on the vSphere platform.
Because it is possible to over-commit resources within a given vSphere cluster while simultaneously guaranteeing resources for select and specific workloads (through the use of reservation, limits, shares or resource pools), the third recommendation contained in the referenced whitepaper is neither accurate nor relevant in a vSphere infrastructure. While we strongly encourage our customers to avoid over-provisioning and over-committing resources for critical applications, vSphere enables our customers to guarantee allocated resources to their Lync Servers while taking advantage of some of the major benefits of virtualization – efficient resource sharing, consolidation and utilization. Critical applications workloads such as Lync can be allocated a reserved amount of resources which are then not available for contention by lower-priority workloads.
On the fourth point where the writers have stated that VM portability “breaks the inherent availability functionality in Lync Server pools”, we are unaware of the “breakage” alluded to in the document. The VMware’s “portability” feature is vMotion, a feature that has been in long use for clustered critical applications like Microsoft Exchange Server (DAG) and Microsoft SQL Server (MSCS or AlwaysOn). We are not aware of any documented incidents of “breakage” attributable to vMotion operations on these workloads, or even for Lync.
In the “Host-based failover clustering and migration for Exchange“ section of its Exchange 2013 virtualization whitepaper, Microsoft defined the following strict criteria for its support of VM “portability” for Exchange workloads:
- Does Microsoft support third-party migration technology? Microsoft can’t make support statements for the integration of third party hypervisor products using these technologies with Exchange, because these technologies aren’t part of the Server Virtualization Validation Program (SVVP). The SVVP covers the other aspects of Microsoft support for third-party hypervisors. You need to ensure that your hypervisor vendor supports the combination of their migration and clustering technology with Exchange. If your hypervisor vendor supports their migration technology with Exchange, Microsoft supports Exchange with their migration technology.
- How does Microsoft define host-based failover clustering? Host-based failover clustering refers to any technology that provides the automatic ability to react to host-level failures and start affected virtual machines on alternate servers. Use of this technology is supported given that, in a failure scenario, the virtual machine is coming up from a cold boot on the alternate host. This technology helps to make sure that the virtual machine never comes up from a saved state that’s persisted on disk because it will be stale relative to the rest of the DAG members.
- What does Microsoft mean by migration support? Migration technology refers to any technology that allows a planned move of a virtual machine from one host machine to another host machine. This move could also be an automated move that occurs as part of resource load balancing, but it isn’t related to a failure in the system. Migrations are supported as long as the virtual machines never come up from a saved state that’s persisted on disk. This means that technology that moves a virtual machine by transporting the state and virtual machine memory over the network with no perceived downtime is supported for use with Exchange. A third-party hypervisor vendor must provide support for the migration technology, while Microsoft provides support for Exchange when used in this configuration.
vMotion, DRS and vSphere HA satisfy all of those requirements without exceptions.
Granted, when not properly configured, a vMotion operation can lead to a brief network packet loss which can then interfere with the relationship between/among clustered VMs. This is a known technical condition in Windows clustering which is not unique to vMotion operations. This condition is well understood within the industry and documented by Microsoft in its Tuning Failover Cluster Network Thresholds Whitepaper.
This is further helpfully documented by Microsoft in the following publication: Having a problem with nodes being removed from active Failover Cluster membership?
Backup vendors have also incorporated these considerations into their publications. See: How do I avoid failover between DAG nodes while the VSS snapshot is being used?
Like most other third-party vendors supporting Microsoft’s Windows Operating System and applications, VMware has incorporated several of the recommended tuning and optimization steps contained in this whitepaper into several of our guides and recommendations to our customers. See our Microsoft Exchange 2013 on VMware Best Practices Guide for an example.
The VMware’s Microsoft Exchange 2013 on VMware Best Practices Guide includes several other configuration prescriptions that, when adhered to, minimize the possibility of an unintended failover of clustered Microsoft application VMs, including the Lync Server nodes. We wish to stress that our “portability” features do not negate or impair the native availability features of Microsoft Lync Server workloads.
We are unaware of any technical impediments to combining vSphere’s robust and proven host-level clustering and availability features with Microsoft Lync Server’s application-level availability features and we encourage our customers to continue to confidently leverage these superior combinations when virtualizing their Lync servers on the vSphere platform. In the absence of any documented and proven incompatibility among these features, we are confident that customers virtualizing their Microsoft Lync Server infrastructure on the vSphere platform will continue to enjoy the full benefits of support to which they are contractually entitled without any inhibition.
In the unlikely event that virtualizing Lync Server workloads results in a refusal of support from Microsoft to a customer, such customers can open a support request ticket with VMware’s Global Support Service and VMware will leverage the framework of support agreements among members of the TSANet “Multi Vendor Support Community” to provide the necessary support to the customers. Both Microsoft and VMware are members of the TSANet Alliance.