Author Archives: Vas Mitra

About Vas Mitra

Vas is a SAP Solutions Architect who has worked on SAP + business critical apps related initiatives at VMware over the past 5+ yrs. Activities include SAP on VMware training/workshops, POCs, pre-sales support and development of SAP on VMware whitepapers and best practice guides. Prior to VMware Vas’ roles and experiences include: SAP Basis Administrator; SAP ABAP developer; SAP Solutions Engineer. These roles have been with a large Systems Integrator, server vendor and SAP IT operations in pharmaceutical/chemical companies.

Hyper-threading Impact on Virtual SAP Sizing and Performance – Part 1 of 2

This is part 1 of 2 blogs that will cover how hyper-threading impacts virtual SAP sizing and performance.   Many virtual SAP deployments  leverage INTEL’s hyper-threading (HT) technology. For each processor core that is physically present, the hypervisor sees two logical processors and shares the workload between them when possible. A vCPU can be scheduled on a logical processor on a core while the other logical processor of the core is idle.  In this blog this is referred to as one vCPU scheduled per core. Two vCPUs can be scheduled on the two logical processors of the same core. This is referred to as two vCPUs scheduled per core. For more background on vSphere scheduling functionality, please see the whitepaper  The CPU Scheduler in VMware vSphere .

I will show three different sizing scenarios.

Scenario 1

The first scenario above shows

  • 14 physical cores with HT enabled (28 logical CPUs).
  • A virtual machine (VM) with 14 vCPUs.
  • vSphere will schedule each vCPU on a logical CPU on a separate dedicated physical core (default behavior). The scheduler prefers a whole idle core, where both logical CPUs of the core are idle, over a partial idle core, where one logical CPU is idle while the other is busy.
  • There is spare capacity for more performance as not all the logical CPUs are utilized.

Scenario 2

The scenario above shows:

  • A virtual machine with 28 vCPUs.
  • vSphere schedules the vCPUs across all the logical CPUs – the two logical CPUs of each physical core are both utilized. This can be achieved a number of ways:
    • Setting manual CPU affinity in the VM to force the vCPUs to be scheduled on specific logical CPUs.
    • Provisioning number of vCPUs greater than number of cores on the host.
    • Deploying a VM with twice the number of vCPUs as cores in a socket and setting the VM level parameter “Numa.PreferHT” to true . All the vCPUs will be scheduled across all the logical CPUs within the socket/NUMA node.
  • Utilization of all the logical CPUs in Scenario 2 provides on average 15% boost in SAP performance/transaction throughput compared to scenario 1. In SAP sizing transaction throughput and performance are measured in the metric “SAPS”. So scenario 2 provides about 15% more SAPS than scenario 1.

Scenario 3

This scenario shows:

  • 16 physical cores with HT enabled (32 logical CPUs)
  • A virtual machine with 16 vCPUs. vSphere will schedule each vCPU on a logical CPU on a separate dedicated physical core (default behavior) – same as Scenario 1.
  • The performance/SAPS throughput is approximately the same as Scenario 2 (based on 15% HT benefit).
    • As we linearly scale up vCPUs and cores in Scenario 1, adding an extra 15% vCPU (and cores) will provide us equivalent performance to Scenario 2.
    • Scaling up vCPUs in Scenario 1 by 15% = 1.15 x 14 ≈ 16 vCPUs (on 16 cores) – this is Scenario 3.

Comparing Scenarios

SAP sizing involves calculations in SAPS. You can see an example at https://blogs.vmware.com/apps/2017/06/awg_s4hana_part1.html#more-2217 . The methodology and example shown here enables you to calculate the number of vCPUs required for business requirements provided in SAPS. You then have the option to design the VMs like Scenario 2 or 3:

  • If we need 16 vCPUs on 16 cores (Scenario 3) an alternative configuration with less cores and equivalent SAPS performance is Scenario 2 (28 vCPUs on 14 cores). The calculation is: 16 / 1.15 ≈ 14 i.e.

M = # of cores utilized (either 2 vCPUs or 1 vCPU scheduled per core)

SAPS of [M cores with 1 vCPU per core] = SAPS of [ M/1.15 cores with 2 vCPUs per core]

  • If we need 28 vCPUs on 14 cores (Scenario 2) an alternative configuration with equivalent SAPS with less vCPUs but more cores is Scenario 3 (16 vCPUs on 16 cores). The calculation is: 14 x 1.15 ~ 16 i.e.

SAPS of [ M cores with 2 vCPUs per core] = SAPS of [M x 1.15 cores with 1 vCPU per core]

The above equations are estimates as we assume linear scalability of SAPS with vCPUs in all the scenarios and an average HT benefit of 15%.

Conclusion

I have shown above when sizing VMs we have the option to configure the VMs with 1 vCPU scheduled per core or 2 x vCPUs scheduled per core.   An equation shows how these options are numerically related. The following table summarizes the difference between the options.

1 https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html

2 https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/vmw-vsphere-virtual-saphana-application-workload-guidance-design.pdf

3 https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/whitepaper/sap_hana_on_vmware_vsphere_best_practices_guide-white-paper.pdf

Part 2 of this blog will seek to demonstrate some of the concepts discussed here with an actual SAP workload.

Performance of SAP Central Services with VMware Fault Tolerance

Many SAP customers in their virtualization journey are considering the option to protect SAP Central Services with VMware Fault Tolerance (FT). Central Services is a single-point-of-failure in the SAP architecture that manages transaction locking and messaging across the SAP system and failure of this service results in downtime for the whole system. It is a strong candidate for VMware FT and we have conducted a 1000-user test in vSphere 6.X which is documented in Section 4  of the SAP VMware Best Practice Guide .

The VMware vSphere 6 Fault Tolerance whitepaper mentions “One of the most common performance observations of virtual machines under FT protection is a variable increase in the network latency of the virtual machine”. Given this how does Central Services and VMware FT impact the performance of the SAP application as experienced by the SAP business user – I will demonstrate a basic example here.

A  potential validation at the infrastructure level could be to run the network “ping” command and SAP utility “niping”. “niping” is a SAP network utility used to help analyze network performance.  When I ran these commands at the OS command line to test network performance between an SAP application server and Central Services in two separate VMs, results showed an increase in latency from about 0.3 to 1.8 ms when VMware FT was turned on for the Central Services VM . This is expected behavior and does not reflect the performance experience that a SAP business user will see with VMware FT.

My next test was to construct a basic SAP application level test.  This test is a custom SAP program (written in ABAP), that automates the change of a sales order document and once executed will update around 50 sales orders automatically in series. For each sales order that is changed a lock is created and managed by Central Services. The program uses standard SAP techniques based on SAP “BDC” for mass input of data by simulating user inputs in screens of transactions. The SAP transaction being called is the Change Sales Order transaction (“VA02”). The program is executed in online mode/foreground via the SAP client SAPGUI.  After each online interaction SAPGUI records the response time at the bottom right in milliseconds – this was used as the performance metric.

The following diagram shows the test environment.

The following tables shows the results.

The difference in average online response time between VMware FT off and on is around 2%. The tests simulate a single user executing the change sales order transaction multiple times very quickly. This is a basic validation which should be followed by a multi-user test with actual users or business workloads simulated in a software testing tool. Note that other tests will show different results than shown here and mileage is expected to vary. In this example the simulated user is making many document changes in a short period of time with no think time. In reality an online business user will spend more time processing data within a transaction which is activity that does not require Central Services but resources on the application server hence the frequency of lock requests generated by a single user would be less than in this example.

SAP and NSX Micro-Segmentation Example & Demo

Many companies deploying SAP systems run business processes that incorporate credit card payment transactions. Credit cards are subject to strict security standards developed by the PCI Security Standards Council, which is a consortium of the largest international payment card issuers. These standards require security settings within the SAP application and in the case where SAP is deployed on the VMware SDDC, PCI standards affects the VMware layer with requirements such as “Install and maintain a firewall configuration to protect cardholder data”. This is addressed by micro-segmentation which makes the data center network more secure by isolating each related group of virtual machines onto a distinct logical network segment, allowing the administrator to firewall traffic traveling from one segment of the data center to another (east-west traffic). This limits attackers’ ability to move laterally in the data center.   Micro-segmentation is powered by the Distributed Firewall (DFW) – a component of NSX. DFW operates at the ESXi hypervisor kernel layer and offers control at the vNIC level, which is very close to a guest VM operating system without being in the operating system.

For SAP micro-segmentation means we can create flexible security policies that align to: the multi-tier architecture of an individual SAP system (presentation, application and database tiers); the landscape of the SAP environment (separate production from non-production). The diagram below shows a SAP micro segmentation example based on the Netweaver ABAP stack with a backend database. The different tiers/components of the SAP architecture are:

  • Presentation tier – in this example I am using the SAP client “SAPGUI” to access the application tier. (note: customer environments would include  browser based access, load balancers and a web tier)
  • Application tier – application servers based on the Netweaver ABAP stack
  • Central Services / Global Host – handles SAP locking services, messaging between the app servers and a NFS share required by all the application servers
  • Database tier – services are database dependent
  • The components are isolated into their own NSX security groups. A NSX security group, in this example (other classifications are possible), is a definition in NSX and corresponds to a logical grouping of VMs within which there is free communication flow. Communication flow in/out of a security group from/to another group depends on the firewall rules.

blog2_pic1

Security policies in the above design provide the following controls:

  • Controlled communication path limited to specific services and protocols between tiers
  • External access only permitted to the application tier via the SAP presentation service
  • Access between application and database VMs via specific database services
  • SAP services ports vary depending on the “Instance Number” assigned to the application servers and Central Services. Some values are shown here.
  • In this example I have included external access from a monitoring tool – vRealize Operations (vROps) with the  Blue Medora  SAP Management pack. This needs access to the application tier via certain ports.

The top right of the diagram above shows a NSX screenshot of the security group definition for the application tier – it shows how VM membership can be dynamically assigned to a security group, for example based on the VM naming convention. This way you can provision new application server VMs and the new VMs will automatically inherent the security policies of the application tier security group.

The following diagram shows the high level architecture that makes all this happen.

blog2_pic2

The following screenshot shows an example configuration of the NSX communication paths based on the micro-segmentation design shown above (note: actual implementations will differ based on customer security requirements).

blog2_pic3

You can see a recorded demo of the configuration at the URL below. The demo starts in vRealize Operations where the Blue Medora SAP Management Pack has been configured to monitor the SAP system. Data collection fails due to activation of the NSX firewall.  The NSX configuration is shown and tested and the services are configured to re-establish communication between vROps and the SAP system.

DEMO URL: https://www.youtube.com/watch?v=qdPhejVCG3s&t=82s&index=1&list=PLCED9FDF31C7C0562

bluedemobutton

Troubleshooting SAP Performance with VMware vRealize Operations

Over the years VMware support has investigated numerous performance escalations of  virtualized tier 1 applications. One of the more challenging aspects of this task is coordinating all the key performance monitoring metrics across the different technology layers from the application down to the hypervisor.  This is where VMware vRealize Operations (vROps) with the Blue Medora Management Packs can help to expedite the troubleshooting process. I will show an example here with a virtualized SAP on Oracle system.

The Blue Medora website has links to all the installation documentation for the different application management packs. Once the SAP and Oracle Management packs are installed and configured in vROps to connect to the individual SAP and Oracle systems, the adapters will discover and generate SAP and Oracle objects in vROps which can be accessed via menu  Home -> Environment ->  All Objects. The following screenshot shows the discovery of the SAP system.

blog_pic1

As shown above the different instances of the multi-tier SAP system have been discovered: two application servers; Central Services; database instance with system ID = “TST”.  You can then drill down into the SAP metrics for an application server.

This SAP system is running on an Oracle database. The Oracle management pack will discover the Oracle database as shown below.

blog_pic2

Now how do we troubleshoot this environment. Let’s show an example.

Performance Escalation Logged with the Helpdesk

SAP end users are complaining of slow response times on the SAP system. Some users are claiming its taking a long time to log into SAP.

Order of Analysis

Analysis will involve monitoring three technology layers. The analysis will start at the virtual layer, then move up to the database and finally to the SAP layer – this is described in the diagram below.

blog_pic3

You can access the different metrics in vROps via the menu:

“Home –> Environment –> All Objects –> <Select Adapter> –> <select adapter object> –> Troubleshooting –> All Metrics –> <select object> –> <select counter> …..”

Step 1 Virtual Metrics

We begin at the infrastructure layer. The following table shows some of the key virtual metrics for this example.

blog_pic4

From above we can see that there does not appear to be any major resource bottlenecks at the infrastructure layer . Next we move up to the database layer.

Step 2 Oracle Metrics

The following table shows two Oracle metrics for this example (note there are other Oracle metrics that would also need to be considered for an in-depth analysis).

blog_pic5

Oracle Logical Reads Per User Call corresponds to the average Oracle blocks read from the buffer cache (part of Oracle’s System Global Area) to service queries from the application server. If the block is not available in the cache it is serviced from disk. A large number of logical reads per user call may be due to expensive SQL statements. Expensive SQL statements can be addressed via SQL tuning. Threshold value and guidelines for the logical reads per user call counter  (and other key Oracle metrics ) are documented in the SAP Knowledgebase article 618868 – FAQ: Oracle performance  .

The Oracle database wait time ratio counter helps to determine if the database is currently experiencing a high percentage of waits/bottlenecks. A higher database wait time ratio indicates that system performance can be improved using “wait event tuning”. The latter requires more in-depth analysis of Oracle wait events – these wait event counters can be accessed within Oracle by the database administrator or can be available in vROps via the Blue Medora management pack for Oracle Enterprise Manager.

In this example both the logical reads per user call and database wait ratio have increased to levels that requires more in-depth analysis to determine if Oracle or bad SQL statements are the cause of the performance problem. However, it is possible that Oracle is performing as expected to process the SQL statements as submitted by the application server. We now need to move to the SAP layer as ultimately all workload originates from the application tier.

Step 3 SAP Metrics

In the final step we look at the SAP counters which can help explain the workload running on the application server. The following table shows some SAP metrics for this example (note there are other relevant metrics).

blog_pic6

The SAP dialog work process utilization shows the percentage of work processes allocated for online user activity that is currently being utilized on the SAP application server. In this example the increase in work process utilization is suspect and requires further inspection by the SAP administrator. So now at this point we would notify the SAP administrator to use SAP tools to troubleshoot further – in this example this step reveals the root cause behind the user complaints.

Root cause: in this performance troubleshooting example the root cause is at the SAP application layer where a batch job was scheduled on the application server competing with the online users. The batch job utilized many of the available work processes thus minimizing the number of free work processes available for the online users.

Potential resolution: reschedule batch job on other application servers or at different time; increase the number of work processes.

SUMMARY

I have shown a troubleshooting scenario of an SAP on Oracle system using vROps to analyze metrics from the vSphere, Oracle and SAP layers.  vROps with the Blue Medora Management Packs has enabled the required visibility across these layers to expedite root cause analysis. In this example I have accessed the required metrics directly via the menu “Home –> Environment –> All Objects –> <Select Adapter> –> etc”. Alternatively you can navigate to the relevant metrics via the out-of-the-box dashboards provided by the management packs – an example of this is described at http://www.bluemedora.com/blog/advanced-troubleshooting-of-virtualized-sap-environments-with-vrealize-operations/ .

Thanks to my colleagues for their guidance on vROps and Oracle: Cameron Jones; Jeff Godfrey; Ben Todd; John Dias; Sudhir Balasubramanian.

Announcing General Availability of VMware Adapter for SAP Landscape Management version 1.4!

****** THIS BLOG IS POSTED ON BEHALF OF THE AUTHOR NELSON YAN *******

At VMworld EMEA in October 2016, we announced the VMware private cloud solution for SAP and the upcoming release of our Adapter for SAP Landscape Management. To read more about the announcement by my colleague Alberto Farronato, check out his blog post here.

Our mission is simple – to provide best-in-class software-defined infrastructure solution that simplifies deployment and management of SAP landscapes so business can focus on innovation instead of “keeping the lights on”.

By virtualizing the SAP environment, the private cloud solution for SAP removes the pains and constraints of running SAP on hardware-defined infrastructure: lack of scalability, high TCO, low business continuity during maintenance, and many more. Furthermore, we can now drive more automation and intelligence in the software-defined infrastructure once SAP has been re-platformed VMware virtualized infrastructure.

With that said, I am excited to announce the general availability of VMware Adapter for SAP Landscape Management version 1.4! The Adapter will radically simplify how SAP basis admins manage and deploy SAP landscapes and more tightly integrates SAP Landscape Management with underlying VMware virtualized infrastructure.

The all new VMware Adapter for SAP Landscape Management

Our latest iteration brings new capability and support that help customers address the challenges around digital transformation for both SAP and VMware.

  • SA-API – give programmatic ability to provision SAP landscape and underlying VMware infrastructure
  • Integration with vRealize Automation – allow basis Admin to leverage SA-API to create templates that end users can consume and self-provision SAP landscapes

The new features augment the existing capabilities of the adapter, taking automation and self-service to the next level so SAP basis admin can focus on value-adding innovating tasks instead of “keeping the lights on” operations. However, let’s not forget the key existing capabilities of the adapter:

  • Provisioning – System Cloning, Copying, and System Refresh
    • Automate key SAP basis provisioning task such as system cloning, copying, and system refresh directly in vCenter with SAP Landscape Management.
  • Operations – SAP Hosts, Storage, and Network Migration
    • Migrate VM, switch its data set and network to stand up SAP hosts, move environments, and deploy disaster recovery solutions – all through the SAP Landscape Management interface.

For customers interested in deploying the Adapter in production SAP landscapes, we are also introducing the option to purchase Production-level support from VMware GSS.  Enterprises can now virtualize SAP with confidence, knowing that they have the backing of both SAP and VMware. Support is optional, and the free community edition of the Adapter continues to be available for non-production environments.

But there’s more… SAP Solution Guide!

In addition, we have also been putting together a comprehensive solution guide on how to install, deploy, and manage an SAP environment on top of VMware SDDC. The guide captures the essence of VMware private cloud solution for SAP, which defines the software stack to virtualize, secure and automate SAP environments leveraging VMware’s software-defined architecture. Key topics include:

  • Best practice for deploying SAP HANA on VMware vSphere
  • Implementing SAP Solutions on VMware products such as vSphere, vSAN, and NSX

The guide is an invaluable resource to help you take the first step in virtualization your SAP environment!

Interested to Learn More?

Visit the VMware Adapter for SAP Landscape Management Product Page

Read the SAP Solution Guide

Updated for vSphere 6 – SAP on VMware Best Practices guide

SAP production support for vSphere 6 was available from the latter half of last year – see http://scn.sap.com/docs/DOC-27384 . The best practices guide has been updated with the latest vSphere 6 information to help you with virtualizing SAP. Some of the new content includes:

  • Estimating SAPS of virtual machines and how this is aligned with ESXi scheduling behavior.
  • Updated analysis of high availability options for SAP in the virtual environment. This includes the use of VMware Fault Tolerance for SAP Central Services installed in a multi-vCPU virtual machine.
  • A section where all the best practices are summarized and categorized by different topics (CPU, memory, high availability etc.). For those already familiar with the vSphere concepts and use cases just skip to this section.

Certain topics like HANA and Business Objects have separate papers dedicated to them – these are referenced and the content is not repeated in this document.

The paper is available for download here.

The Case for SAP Central Services and VMware Fault Tolerance

What’s the CPU Utilization Of Standalone SAP Central Services in a Virtual Machine?

Since VMware came out with VMware Fault Tolerance (FT) we have considered the deployment option of installing SAP Central Services in a 1 x vCPU virtual machine protected by VMware FT. FT creates a live shadow instance of a virtual machine that is always up-to-date with the primary virtual machine. In the event of a hardware outage, VMware FT automatically triggers failover—ensuring zero downtime and preventing data loss. Central Services is a single-point-of-failure in the SAP architecture that manages transaction locking and messaging across the SAP system and failure of this service results in downtime for the whole system. Hence Central Services is a strong candidate for FT but FT currently only supports 1 x vCPU (vSphere 5.x), so some guidance is required on how many users we can support in this configuration. VMware has given technical previews of multi-vCPU virtual machines protected by FT at VMworld 2013/2014, but now, better late than never, here are the results of a lab test demonstrating the performance of standalone Central Services in a 1 x vCPU virtual machine. Continue reading

SAP on VMware Sizing & Design Example

Recently in partner workshops I have come across some interesting discussions about the impact of hyper-threading and NUMA in sizing business critical applications on VMware. So here is an SAP example based on SAP’s sizing metric “SAPS” (a hardware-independent unit of measurement that equates to SAP OLTP throughput of Sales and Distribution users).  The examples here refer to vSphere scheduling concepts in this useful whitepaper The CPU Scheduler in VMware vSphere 5.1 .

SAP sizing requires the SAPS rating of the hardware which for estimation purposes can be obtained from certified SAP benchmarks published at http://www.sap.com/solutions/benchmark/sd2tier.epx . Let’s use certification 2011027  and assume that we plan to deploy on similar hardware as used in this benchmark. This is a virtual benchmark on vSphere 5 with the following result: 25120 SAPS (at ~100% CPU) for 24 vCPUs running on a server with 2 processors, 6 cores per processor and 24 logical CPUs as hyper-threading was enabled. This is a NUMA system where each processor is referred to as a NUMA node.  (Note cert 2011027 is an older benchmark, the SAPS values for vSphere on newer servers with faster processors would be different/higher, hence work with the server vendors to utilize the most recent and accurate SAPS ratings). Continue reading

Monitoring Business Critical Applications with VMware vCenter Operations Manager

The VMware BCA team recently worked with the vCOps gurus to produce the whitepaper “Monitoring Business Critical Applications with VMware vCenter Operations Manager”. 

The paper is available at https://www.vmware.com/files/pdf/solutions/Monitoring-Business-Critical-Applications-VMware-vCenter-Operations-Manager-white-paper.pdf .

The document provides an overview of monitoring the following business critical applications with vCOps: SAP; Exchange; Oracle; and SQL Server. It describes the vCOps adapters, including Hyperic, for these applications. Some key application performance metrics are covered and example dashboards are provided and explained.

Estimating Availability of SAP on ESXi Clusters – Examples

This is a follow up to the blog I posted in Jan 2013 which identified a generic formula to estimate the availability, expressed as a percentage/fraction, of SAP virtual machines in an ESXi cluster.  The details of the formula are in this whitepaper . This blog provides some example results based on some assumed input data. I used a spreadsheet to model the equation and generate the results – this is shown at the end. The formula is based on mathematical probability techniques. The availability of SAP on an ESXi cluster is dependent on: the probability of failure of multiple ESXi hosts based on the number of spares; the probability that the SPOFs (database & central services) are failing over due to a VMware HA event (depends on failover times and the frequency of ESXi host failures).

The example starts with a single 4-node ESXi cluster running multiple SAP database, application server and central services virtual machines (VMs) corresponding to different SAP applications (ERP, BW, CRM etc.).  A sizing engagement has determined that 4 ESXi hosts are required to drive the performance of all the SAP VMs (the SAP landscape). We assume the sizing is such that the memory of all the VMs will not fit into the physical memory of three or less hosts, and as we typically have memory reservations set (a best practice for mission critical SAP), VMs may not restart after a VMware HA event. So we conservatively treat any host failures that result in less than 4 ESXi hosts as downtime for the SAP landscape (not true at the individual VM/SAP system level as some of the VMs can be de-prioritized in the degraded state in favor of others but we are going with the landscape level approach to provide a worst case estimate). For this reason we design with redundancy by adding extra ESXi hosts in the cluster so I will compare three options with different degrees of redundancy: Continue reading