One of the relatively newer use cases for SRM is planned migration. With this use case, customers can migrate their business critical workloads to the recovery or cloud provider sites in a planned manner. This could be in planning for an upcoming threat such as a hurricane or other disaster or an actual datacenter migration to a different location or cloud provider.
A protection group is a group of virtual machines that fail over together to the recovery site. Protection groups contain virtual machines whose data has been replicated by array-based replication or by VR. Typically contains virtual machines that are related in some way such as:
A three-tier application (application server, database server, Web server)
Virtual machines whose virtual machine disk files are part of the same datastore group.
The purpose of the exercise was to demonstrate use cases for disaster recovery of real business critical applications (BCA) leveraging VMware solutions such as VMWare Site Recovery Manager (SRM). Techniques to protect against disaster for common business critical applications such as Microsoft Exchange, Microsoft SQL Server, SAP and Oracle Databases are discussed.
What’s the CPU Utilization Of Standalone SAP Central Services in a Virtual Machine?
Since VMware came out with VMware Fault Tolerance (FT) we have considered the deployment option of installing SAP Central Services in a 1 x vCPU virtual machine protected by VMware FT. FT creates a live shadow instance of a virtual machine that is always up-to-date with the primary virtual machine. In the event of a hardware outage, VMware FT automatically triggers failover—ensuring zero downtime and preventing data loss. Central Services is a single-point-of-failure in the SAP architecture that manages transaction locking and messaging across the SAP system and failure of this service results in downtime for the whole system. Hence Central Services is a strong candidate for FT but FT currently only supports 1 x vCPU (vSphere 5.x), so some guidance is required on how many users we can support in this configuration. VMware has given technical previews of multi-vCPU virtual machines protected by FT at VMworld 2013/2014, but now, better late than never, here are the results of a lab test demonstrating the performance of standalone Central Services in a 1 x vCPU virtual machine. Continue reading →
Availability of the applications is one of the important requirements for business critical applications. The vSphere platform provides capabilities to protect against hardware and software failures to meet the stringent SLA requirements of business critical applications.
Virtualized applications can avail of features such as vSphere HA protects from HW failures by restarting virtual machines on surviving hosts if a physical host were to fail. vSphere FT (Fault Tolerance) protects critical servers such as load balancers and other central servers with a small footprint (1vCPU) with zero downtime in the event of HW failures.
vSphere App HA for Application level protection for supported applications. Third party solutions that leverage vSphere Application Awareness API such as Symantec Application HA , NEVERFAIL, etc. layer on top of VMware HA and provide monitoring and availability for most of the common business critical applications.
The type of licensing impacts the cluster design for SAP virtualization. SAP is supported on most common database platforms such as SQL, Oracle, DB2 and SYBASE. When customers procure SAP, they can choose to buy the database licensing through SAP or purchase it directly from the database vendor. This decision impacts the cluster design for virtualized SAP environments.
Let us look at these two scenarios and their impact on the design.
Scenario 1: Database License procured from the DB vendor for SAP:
Database vendors have differing but usually very restrictive policies regarding virtual machines running databases. The cost of licensing databases in the extreme case could force a customer to license for the entire cluster, even though the database could be using only a small subset of the resources. Due to the uncertainty and the risk involved with DB licensing in this situation, it might be prudent to separate the entire database workload into its own cluster. By separating the entire database workload, the physical hardware used for databases can be isolated and licensed fully. Since only database workloads exist in this cluster one can achieve consolidation and efficiency for databases. The main disadvantage is the added overhead of having a separate cluster for databases. Since SAP landscapes have many modules with each module having its own individual database, creating a separate DB cluster with a good number of hosts is worthwhile and justified. Continue reading →
Over the past few years, there has been significant acceleration in adoption of the VMware platform for virtualization of business critical applications. When vSphere 5 was introduced with its initial support for up to 32 vCPU many of the vertical scalability concerns that existed earlier were addressed. This has been increased to 64 processors with the later vSphere 5.x releases ensuring that more than 99% of all workloads will fit vertically.
Having personally worked in IT infrastructure for more than 20 years with a strong focus on implementing and managing business critical applications, I see a general reluctance from application owners to virtualize business critical applications. When virtualizing business applications there are many critical factors one should consider. I seek to address the typical concerns of application owners about Virtualization with this multipart series on Virtualizing BCA. Continue reading →
Recently in partner workshops I have come across some interesting discussions about the impact of hyper-threading and NUMA in sizing business critical applications on VMware. So here is an SAP example based on SAP’s sizing metric “SAPS” (a hardware-independent unit of measurement that equates to SAP OLTP throughput of Sales and Distribution users). The examples here refer to vSphere scheduling concepts in this useful whitepaper The CPU Scheduler in VMware vSphere 5.1 .
SAP sizing requires the SAPS rating of the hardware which for estimation purposes can be obtained from certified SAP benchmarks published at http://www.sap.com/solutions/benchmark/sd2tier.epx . Let’s use certification 2011027 and assume that we plan to deploy on similar hardware as used in this benchmark. This is a virtual benchmark on vSphere 5 with the following result: 25120 SAPS (at ~100% CPU) for 24 vCPUs running on a server with 2 processors, 6 cores per processor and 24 logical CPUs as hyper-threading was enabled. This is a NUMA system where each processor is referred to as a NUMA node. (Note cert 2011027 is an older benchmark, the SAPS values for vSphere on newer servers with faster processors would be different/higher, hence work with the server vendors to utilize the most recent and accurate SAPS ratings). Continue reading →
This is a follow up to the blog I posted in Jan 2013 which identified a generic formula to estimate the availability, expressed as a percentage/fraction, of SAP virtual machines in an ESXi cluster. The details of the formula are in this whitepaper . This blog provides some example results based on some assumed input data. I used a spreadsheet to model the equation and generate the results – this is shown at the end. The formula is based on mathematical probability techniques. The availability of SAP on an ESXi cluster is dependent on: the probability of failure of multiple ESXi hosts based on the number of spares; the probability that the SPOFs (database & central services) are failing over due to a VMware HA event (depends on failover times and the frequency of ESXi host failures).
The example starts with a single 4-node ESXi cluster running multiple SAP database, application server and central services virtual machines (VMs) corresponding to different SAP applications (ERP, BW, CRM etc.). A sizing engagement has determined that 4 ESXi hosts are required to drive the performance of all the SAP VMs (the SAP landscape). We assume the sizing is such that the memory of all the VMs will not fit into the physical memory of three or less hosts, and as we typically have memory reservations set (a best practice for mission critical SAP), VMs may not restart after a VMware HA event. So we conservatively treat any host failures that result in less than 4 ESXi hosts as downtime for the SAP landscape (not true at the individual VM/SAP system level as some of the VMs can be de-prioritized in the degraded state in favor of others but we are going with the landscape level approach to provide a worst case estimate). For this reason we design with redundancy by adding extra ESXi hosts in the cluster so I will compare three options with different degrees of redundancy: Continue reading →
"Arithmetic is where the answer is right and everything is nice and you can look out of the window and see the blue sky - or the answer is wrong and you have to start over and try again and see how it comes out this time." ~Carl Sandburg
When we architect SAP on VMware deployments an important topic is how we design for high availability. We have options in the VMware environment from VMware HA, VMware FT and use of in-guest clustering software like Microsoft Cluster Services or Linux-HA. So can we determine a numerical availability for our design expressed as a fraction/percentage (same metric used to define uptime Service Level Agreements like 99.9% )? Yes, there are ways to estimate this value and one method is explained in the following paper http://www.availabilitydigest.com/public_articles/0712/sap_vmware.pdf . This paper develops an equation to estimate the availability of SAP running on an ESXi cluster expressed as a fraction/percentage. The concepts are taken from other papers at http://www.availabilitydigest.com (a digest of topics on high availability) and are based on mathematical algebra and probability theory that have been previously used in the IT industry for availability calculations. The availability metric (e.g. 99.9% or 0.999) is essentially a probability hence we use mathematical probability techniques to calculate the overall availability of a system. Continue reading →