Home > Blogs > Virtualize Business Critical Applications
Featured post

A Stronger Case For Virtualizing Exchange Server 2013 - Think "Performance"

VMware is glad to see that the Microsoft Exchange Server (and Performance) teams appear to have identified the prevalent cause of performance-related issues in an Exchange Server 2013 infrastructure. We have been aware for several years that Microsoft's sizing recommendation for Exchange Server 2013 is the number one cause of every performance issue that have been reported to VMware since the release of Exchange Server 2013, and it is gratifying that Microsoft is acknowledging this as well.

In May of 2015, Microsoft released a blog post titled "Troubleshooting High CPU utilization issues in Exchange 2013" in which Microsoft acknowledged (for the first time, to our knowledge) that CPU over-sizing is one of the chief causes of performance issues on Exchange Server 2013. We wish to highlight the fact that the Exchange 2013 Server Role Requirements Calculator is the main culprit in this state of affair. One thing we noticed with the release of Exchange Server 2013 and its accompanying "Calculator" is the increase in the compute resources it recommends when compared to similar configuration in prior versions of Exchange Server.

We at VMware have been drawing our customers' attention to this anomaly and have been educating them to not take the "Calculator's" recommendation as gospel. Unfortunately, not many customers like to buck Microsoft, especially in the face of strident claims of "This is our product and we are the experts". Sadly, customers who have moved to Exchange Server 2013, using the Calculator's recommendation (or the equally disruptive "Preferred Architecture" design from Microsoft) have been invariably hurt by the unsound recommendation.

Fortunately for everyone concerned, Microsoft appears to be moving in the right direction - if the recent blog post from the Exchange Server's Principal PM is an indication of things to come. in his "Ask the Perf Guy: How big is too BIG?" Jeff Mealiffe expounded on the revelation in the "Troubleshooting High CPU utilization issues in Exchange 2013" blog post, and provided a handy chart of recommended Exchange Server 2013 Server CPU and Memory sizing:

In a nutshell, we recommend not exceeding the following sizing characteristics for Exchange 2013 servers, whether single-role or multi-role (and you are running multi-role, right?).

Recommended Maximum CPU Core Count

24

Recommended Maximum Memory

96 GB

If you have upgraded your Exchange infrastructure to Exchange Server 2013, you will do yourself a lot of good if you find the opportunity to completely read the discussion presented in the links above.

A corresponding (and hopefully, more reasonable) "Calculator" has also been released to accompany this new recommendation - Exchange 2013 Server Role Requirements Calculator.

While we are glad that Microsoft has evolved in some ways and the Exchange team is now more open in discussing the inherent defects in Exchange Server 2013, we cannot but notice that Jeff' et al continued to push the "Combined Role" design recommendation, in spite of the fact that such design unnecessarily complicates performance troubleshooting and hinders fault domain isolation. We at VMware once wondered what necessitated Microsoft's change in design prescription around the time of Exchange Server 2013's release (Microsoft previously championed separated roles design, with the exception of the CAS/HT roles). Our (speculative) conclusion was that it was the ONLY reasonable design option that the Microsoft Exchange Server team could propose in order to continue to justify the "Exchange is Better on Physical" design proposition favored by Microsoft. The "Better on Hardware" mindset is the basis of the Preferred Architecture.

One of the major issues addressed in Jeff's and the previous blog posts is the way Exchange pre-allocates memory based on the number of CPU cores that it "sees". We suspect that this is the main reason why the Exchange Team is not virtualization-friendly. Perhaps on some hypervisors, the virtualized Exchange server "sees" ALL the CPUs that the parent sees, hence if the parent host sees 64 CPUs cores, Exchange Server will count all 64 cores as needing to be accounted for in memory allocation, even if the Exchange VM itself has only been allocated, say, 8 vCPUs. We speculate. But, this would be a logical rationale for Microsoft's insistence on multitudinous proliferation of "itsy-bitsy-sized" silo'ed physical hardware for Exchange Server. For the avoidance of doubt, this is NOT a problem on the vSphere platform - the virtualized Exchange Server does NOT "see" more than the number of CPUs that is has been allocated, regardless of the size of the Esxi Host's physical hardware.

We would like to enthusiastically echo Jeff's conclusion in his blog post:

It’s a fact that many of you have various constraints on the hardware that you can deploy in your datacenters, and often those constraints are driven by a desire to reduce server count, increase server density, etc. Within those constraints, it can be very challenging to design an Exchange implementation that follows our scalability guidance and the Preferred Architecture. Keep in mind that in this case, virtualization may be a feasible option rather than a risky attempt to circumvent scalability guidance and operate extremely large Exchange servers. Virtualization of Exchange is a well understood, fairly common solution to this problem, and while it does add complexity (and therefore some additional cost and risk) to your deployment, it can also allow you to take advantage of large hardware while ensuring that Exchange gets the resources it needs to operate as effectively as possible.

One more item of important - PLEASE do not rely on the CPU/RAM sizing recommendation of the Exchange Calculator as sole determinant of how much compute resources you will allocate to your VIRTUALIZED Exchange servers. In addition to the calculator's recommendation being intentionally generous in its recommendation, the conservative approach to maximum utilization target of the allocated resources is VERY PROBLEMATIC in a virtual environment. One of the major tenets of virtualization is resource sharing. In order to ensure equitable sharing of the pooled resources in a virtualized environment, it is important to ensure that a VM does not unnecessarily hog resources. A VM should have adequate access to the compute resources that it NEEDS whenever it needs them. However, in doing so, the VM should not have more compute resources than it needs, otherwise such VMs will contravene the principles of "equitable sharing" and "fairness" in the virtual environment.

Here is a sample of the recommended compute resources that the Exchange Server team released with the latest Exchange Server 2013 Calculator:

http://blogs.technet.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-31-06-metablogapi/Calc1_5F00_27446058.png

Source: Microsoft Exchange Server Blog Site

At its most extreme, the Exchange servers shown in the image above are NOT expected to exceed a 46% utilization threshold. In steady state operation, the target is 28% of allocated resources. In any scenario, these numbers would be considered gross wastage, with the type of ROI that gives a CFO persistent ulcers. In a virtualized environment, such gross under-utilization will be noticeably detrimental to the virtualized workloads. In our experience, baselining your virtualized Exchange workload at 70% of the Calculator's recommended sizes has always been a prudent choice for our customers. One of the benefits of virtualization is that adjusting this allocation upwards is a trivial exercise that does not take more than 5 minutes of schedule downtime - a much better proposition than oversizing and running into not just the issues described in the blog posts above, but also having the VM under-perform because it was not able to judiciously use its allocated monster-size resources. This is why we caution our customers against basing their Exchange Server virtualization projects on the prescriptions of the Microsoft Preferred Architecture (PA). The Preferred Architecture assumes that Exchange will be hosted on physical servers, so it has no notion of the "fairness" doctrine described above. Trying to retrofit a Preferred Architecture design onto a virtual environment invariably leads to severe performance issues.

Featured post

Say Hello to vMotion-compatible Shared-Disks Windows Clustering on vSphere

As you dive into the inner-workings of the new version of VMware vSphere (aka ESXi), one of the gems you will discover to your delight is the enhanced virtual machine portability feature that allows you to vMotion a running pair of clustered Windows workloads that have been configured with shared disks.

I pause here now to let you complete the obligatory jiggy dance. No? You have no idea what I just talked about up there, do you? Let me break it down for you:
In vSphere 6.0, you can configure two or more VMs running Windows Server Failover Clustering (or MSCS for pre-Windows 2012 OSes), using common, shared virtual disks (RDM) among them AND still be able to successfully vMotion any of the clustered nodes without inducing failure in WSFC or the clustered application. What's the big-deal about that? Well, it is the first time VMware has ever officially supported such configuration without any third-party solution, formal exception, or a number of caveats. Simply put, this is now an official, out-of-the-box feature that does not have any exception or special requirements other than the following:
  • The VMs must be in "Hardware 11" compatibility mode - which means that you are either creating and running the VMs on ESXi 6.0 hosts, or you have converted your old template to Hardware 11 and deployed it on ESXi 6.0
  • The disks must be connected to virtual SCSI controllers that have been configured for "Physical" SCSI Bus Sharing mode
  • And the disk type *MUST* be of the "Raw Device Mapping" type. VMDK disks are *NOT* supported for the configuration described in this document.
Featured post

Virtualizing Microsoft Lync Server – Let's Clear up the Confusion

We at VMware have been fielding a lot of inquiries lately from customers who have virtualized (or are considering virtualizing) their Microsoft Lync Server infrastructure on the VMware vSphere platform. The nature of inquiries is centered on certain generalized statements contained in the “Planning a Lync Server 2013 Deployment on Virtual Servers” whitepaper published by the Microsoft Lync Server Product Group. In the referenced document, the writers made the following assertions:

  • You should disable hyper-threading on all hosts.
  • Disable non-uniform memory access (NUMA) spanning on the hypervisor, as this can reduce guest performance.
  • Virtualization also introduces a new layer of configuration and optimization techniques for each guest that must be determined and tested for Lync Server. Many virtualization techniques that can lead to consolidation and optimization for other applications cannot be used with Lync Server. Shared resource techniques, including processor oversubscription, memory over-commitment, and I/O virtualization, cannot be used because of their negative impact on Lync scale and call quality.
  • Virtual machine portability—the capability to move a virtual machine guest server from one physical host to another—breaks the inherent availability functionality in Lync Server pools. Moving a guest server while operating is not supported in Lync Server 2013. Lync Server 2013 has a rich set of application-specific failover techniques, including data replication within a pool and between pools. Virtual machine-based failover techniques break these application-specific failover capabilities.

VMware has contacted the writers of this document and requested corrections to (or clarification of) the statements because they do not, to our knowledge, convey known facts and they reflect a fundamental misunderstanding of vSphere features and capabilities. While we await further information from the writers of the referenced document, it has become necessary for us at VMware to publicly provide a direct clarification to our customers who have expressed confusion at the statements above. Continue reading

Featured post

Disabling TPS in vSphere - Impact on Critical Applications

Starting with update releases in December, 2014, VMware vSphere will default to a new configuration for the Transparent Page Sharing (TPS) feature. Unlike in prior versions of vSphere up to that point, TPS will be DISABLED by default. TPS will continued to be disabled for all future versions of vSphere.

In the interim, VMware has released a Patch for vSphere 5.5 which changes the behavior of (and provides additional configuration options for) TPS. Similar patches will also be released for prior versions at a later date.

Why are we doing this?

In a nutshell, independent research indicates that TPS can be abused to gain unauthorized access to data under certain highly controlled conditions. In line with its "secure by default" security posture, VMware has opted to change the default behavior of TPS and provide customers with a configurable option for selectively and more securely enabling TPS in their environment. Please read "Security considerations and disallowing inter-Virtual Machine Transparent Page Sharing (2080735)" for more detailed discussion of the security issues and VMware's response. Continue reading

Featured post

Just Published - Virtualizing Active Directory Domain Services On VMware vSphere®

Announcing the latest addition to our series of prescriptive guidance for virtualizing Business Critical Applications on the VMware vSphere platform.

Microsoft Windows Active Directory Domain Services (AD DS) is one of the most pervasive Directory Services platforms in the market today. Because of the importance of AD DS to the operation and availability of other critical services, applications and processes, the stability and availability of AD DS itself is usually very important to most organizations.

Although the "Virtualization First" concept is becoming a widely-accepted operational practice in the enterprise, many IT shops are still reluctant to completely virtualize Domain Controllers. The most conservative organizations have an absolute aversion to domain controller virtualization while the more conservative organizations choose to virtualize a portion of the AD DS environment and retain a portion on physical hardware. Empirical data indicate that the cause of this opposition to domain controller virtualization is a combination of historical artifacts, misinformation, lack of experience in virtualization, or fear of the unknown.

The new Guide - Virtualizing Active Directory Domain Services On VMware vSphere® - is intended to help the reader overcome this inhibition by clearly addressing both the legacy artifacts and the new advancements in Microsoft Windows that help ameliorate the past deficiencies and make AD DS more virtualization-friendly - and safer.

With the release of Windows Server 2012, new features alleviate many of the legitimate concerns that administrators have about virtualizing AD DS. These new features, the latest versions of VMware® vSphere®, and recommended practices help achieve 100 percent virtualization of AD DS.

The Guide includes a number of best practice guidelines for administrators to help them optimally and safely deploy AD DS on VMware vSphere.

The recommendations in this guide are not specific to a particular set of hardware or to the size and scope of a specific AD DS implementation. The examples and considerations in this document provide guidance, but do not represent strict design requirements.

Similar to other "Best Practices" releases from VMware, this Guide is intended to serve as your companion and primary reference guidepost if you have any responsibility planning, designing, implementing and operating a virtualized Active Directory Domain Services instance in a VMware vSphere infrastructure.

This guide assumes a basic knowledge and understanding of vSphere and AD DS.
Architectural staff can use this document to understand the design considerations for deploying a virtualized AD DS environment on vSphere.
Engineers and administrators can use this document as a catalog of technical capabilities.
Management staff and process owners can use this document to help model business processes that take advantage of the savings and operational efficiencies achieved with virtualization.

You can download Virtualizing Active Directory Domain Services On VMware vSphere® here.

Featured post

Which vSphere Operation Impacts Windows VM-Generation ID?

In Windows Server 2012 VM-Generation ID Support in vSphere, we introduced you to VMware's support for the new Microsoft's Windows VM-Generation ID features, discussing how they help address some of the challenges facing Active Directory administrators looking to virtualize domain controllers.

One of the common requests from customers in response to the referenced article is a list of events and conditions under which an administrator can expect the VM-Generation ID of a virtual machine to change in a VMware vSphere infrastructure. The table below presents this list. This table will be included in an upcoming Active Directory on VMware vSphere Best Practices Guide.

Scenario VM-Generation ID Change
VMware vSphere vMotion®/VMware vSphere Storage vMotion No
Virtual machine pause/resume No
Virtual machine reboot No
HA restart No
FT failover No
vSphere host reboot No
Import virtual machine Yes
Cold clone Yes
Hot clone
Note
Hot cloning of virtual domain controllers is not supported by either Microsoft or VMware. Do not attempt hot cloning under any circumstances.
Yes
New virtual machine from VMware Virtual Disk Development Kit (VMDK) copy Yes
Cold snapshot revert (while powered off or while running and not taking a memory snapshot) Yes
Hot snapshot revert (while powered on with a memory snapshot) Yes
Restore from virtual machine level backup Yes
Virtual machine replication (using both host-based and array-level replication) Yes

If you have a specific operation or task that is not included in the table above, please be sure to ask in the comments section.

Thank you.

Updated availability guide for vCenter 5.5 with Microsoft Clustering support now available

The original vCenter Server 5.5 Availability Guide was published in December 2014.

With the End of Availability of vCenter Server Heartbeat guidance was provided on how to monitor and protect vCenter. Due to the need for additional protection, we have internally validated using Windows Server Failover Clustering for protection of vCenter services. Improved SLAs can be attained with this clustering solution. The update provides step-by-step guidance to deploy this solution to protect vCenter 5.5

You can download the updated paper here: https://www.vmware.com/resources/techresources/10444

 

 

Disaster Recovery for Virtualized Business Critical Applications (Part 3 of 3)

Planned Migration:

One of the relatively newer use cases for SRM is planned migration. With this use case, customers can migrate their business critical workloads to the recovery or cloud provider sites in a planned manner. This could be in planning for an upcoming threat such as a hurricane or other disaster or an actual datacenter migration to a different location or cloud provider.

Continue reading

Disaster Recovery for Virtualized Business Critical Applications (Part 2 of 3)

Protection Groups:

A protection group is a group of virtual machines that fail over together to the recovery site. Protection groups contain virtual machines whose data has been replicated by array-based replication or by VR. Typically contains virtual machines that are related in some way such as:

  • A three-tier application (application server, database server, Web server)
  • Virtual machines whose virtual machine disk files are part of the same datastore group.

Continue reading

Disaster Recovery for Virtualized Business Critical Applications (Part 1 of 3)

The purpose of the exercise was to demonstrate use cases for disaster recovery of  real business critical applications (BCA) leveraging VMware solutions such as VMWare Site Recovery Manager (SRM). Techniques to protect  against disaster for common business critical applications such as Microsoft Exchange, Microsoft SQL Server, SAP and Oracle Databases are discussed.

Continue reading

The Case for SAP Central Services and VMware Fault Tolerance

What’s the CPU Utilization Of Standalone SAP Central Services in a Virtual Machine?

Since VMware came out with VMware Fault Tolerance (FT) we have considered the deployment option of installing SAP Central Services in a 1 x vCPU virtual machine protected by VMware FT. FT creates a live shadow instance of a virtual machine that is always up-to-date with the primary virtual machine. In the event of a hardware outage, VMware FT automatically triggers failover—ensuring zero downtime and preventing data loss. Central Services is a single-point-of-failure in the SAP architecture that manages transaction locking and messaging across the SAP system and failure of this service results in downtime for the whole system. Hence Central Services is a strong candidate for FT but FT currently only supports 1 x vCPU (vSphere 5.x), so some guidance is required on how many users we can support in this configuration. VMware has given technical previews of multi-vCPU virtual machines protected by FT at VMworld 2013/2014, but now, better late than never, here are the results of a lab test demonstrating the performance of standalone Central Services in a 1 x vCPU virtual machine. Continue reading

OLTP performance on Virtualized SQL 2014 with All Flash Arrays


TPC-C Benchmark is an on-line transaction processing (OLTP). (TPCC Main site) TPC-C uses a mix of five concurrent transactions of different types and complexity. The database is comprised of nine types of tables with a wide range of record and population sizes. TPC-C is measured as transactions per minute (TPM).

The goal of this exercise was to see if 1 million TPM can be achieved on virtualized SQL 2014 backed by an all Flash storage array for a TPC-C like test.  The TPC-C testing would be compared between two VM sizes (Within NUMA & Exceeding NUMA boundaries)

Continue reading