Home > Blogs > Virtualize Business Critical Applications


Hyper-threading Impact on Virtual SAP Sizing and Performance – Part 1 of 2

This is part 1 of 2 blogs that will cover how hyper-threading impacts virtual SAP sizing and performance.   Many virtual SAP deployments  leverage INTEL’s hyper-threading (HT) technology. For each processor core that is physically present, the hypervisor sees two logical processors and shares the workload between them when possible. A vCPU can be scheduled on a logical processor on a core while the other logical processor of the core is idle.  In this blog this is referred to as one vCPU scheduled per core. Two vCPUs can be scheduled on the two logical processors of the same core. This is referred to as two vCPUs scheduled per core. For more background on vSphere scheduling functionality, please see the whitepaper  The CPU Scheduler in VMware vSphere .

I will show three different sizing scenarios.

Scenario 1

The first scenario above shows

  • 14 physical cores with HT enabled (28 logical CPUs).
  • A virtual machine (VM) with 14 vCPUs.
  • vSphere will schedule each vCPU on a logical CPU on a separate dedicated physical core (default behavior). The scheduler prefers a whole idle core, where both logical CPUs of the core are idle, over a partial idle core, where one logical CPU is idle while the other is busy.
  • There is spare capacity for more performance as not all the logical CPUs are utilized.

Scenario 2

The scenario above shows:

  • A virtual machine with 28 vCPUs.
  • vSphere schedules the vCPUs across all the logical CPUs – the two logical CPUs of each physical core are both utilized. This can be achieved a number of ways:
    • Setting manual CPU affinity in the VM to force the vCPUs to be scheduled on specific logical CPUs.
    • Provisioning number of vCPUs greater than number of cores on the host.
    • Deploying a VM with twice the number of vCPUs as cores in a socket and setting the VM level parameter “Numa.PreferHT” to true . All the vCPUs will be scheduled across all the logical CPUs within the socket/NUMA node.
  • Utilization of all the logical CPUs in Scenario 2 provides on average 15% boost in SAP performance/transaction throughput compared to scenario 1. In SAP sizing transaction throughput and performance are measured in the metric “SAPS”. So scenario 2 provides about 15% more SAPS than scenario 1.

Scenario 3

This scenario shows:

  • 16 physical cores with HT enabled (32 logical CPUs)
  • A virtual machine with 16 vCPUs. vSphere will schedule each vCPU on a logical CPU on a separate dedicated physical core (default behavior) – same as Scenario 1.
  • The performance/SAPS throughput is approximately the same as Scenario 2 (based on 15% HT benefit).
    • As we linearly scale up vCPUs and cores in Scenario 1, adding an extra 15% vCPU (and cores) will provide us equivalent performance to Scenario 2.
    • Scaling up vCPUs in Scenario 1 by 15% = 1.15 x 14 ≈ 16 vCPUs (on 16 cores) – this is Scenario 3.

Comparing Scenarios

SAP sizing involves calculations in SAPS. You can see an example at https://blogs.vmware.com/apps/2017/06/awg_s4hana_part1.html#more-2217 . The methodology and example shown here enables you to calculate the number of vCPUs required for business requirements provided in SAPS. You then have the option to design the VMs like Scenario 2 or 3:

  • If we need 16 vCPUs on 16 cores (Scenario 3) an alternative configuration with less cores and equivalent SAPS performance is Scenario 2 (28 vCPUs on 14 cores). The calculation is: 16 / 1.15 ≈ 14 i.e.

M = # of cores utilized (either 2 vCPUs or 1 vCPU scheduled per core)

SAPS of [M cores with 1 vCPU per core] = SAPS of [ M/1.15 cores with 2 vCPUs per core]

  • If we need 28 vCPUs on 14 cores (Scenario 2) an alternative configuration with equivalent SAPS with less vCPUs but more cores is Scenario 3 (16 vCPUs on 16 cores). The calculation is: 14 x 1.15 ~ 16 i.e.

SAPS of [ M cores with 2 vCPUs per core] = SAPS of [M x 1.15 cores with 1 vCPU per core]

The above equations are estimates as we assume linear scalability of SAPS with vCPUs in all the scenarios and an average HT benefit of 15%.

Conclusion

I have shown above when sizing VMs we have the option to configure the VMs with 1 vCPU scheduled per core or 2 x vCPUs scheduled per core.   An equation shows how these options are numerically related. The following table summarizes the difference between the options.

1 https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html

2 https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/vmw-vsphere-virtual-saphana-application-workload-guidance-design.pdf

3 https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/whitepaper/sap_hana_on_vmware_vsphere_best_practices_guide-white-paper.pdf

Part 2 of this blog will seek to demonstrate some of the concepts discussed here with an actual SAP workload.

This entry was posted in SAP on by .

About Vas Mitra

Vas is a SAP Solutions Architect who has worked on SAP + business critical apps related initiatives at VMware over the past 5+ yrs. Activities include SAP on VMware training/workshops, POCs, pre-sales support and development of SAP on VMware whitepapers and best practice guides. Prior to VMware Vas’ roles and experiences include: SAP Basis Administrator; SAP ABAP developer; SAP Solutions Engineer. These roles have been with a large Systems Integrator, server vendor and SAP IT operations in pharmaceutical/chemical companies.

2 thoughts on “Hyper-threading Impact on Virtual SAP Sizing and Performance – Part 1 of 2

  1. Fabian Lenz

    All scenarios seems to assume that the virtual memory of a single node is larger than the NUMA-node memory size?

    Assumption:
    1. Server with 2-Numa Nodes: 2xCPU with 12 Cores (plus HT) and 512 GB memory per CPU.
    2. SAP-Hana VM memory size: 768 GB
    3. Single SAP Hana VM per ESXi

    From a CPU scheduling perspective which of the following scenarios is giving us the maximum performance?

    -> Scenario 1: VM sized with 20vCPU and 768 GB Memory
    -> Scenario 2: VM sized with 40vCPU and 768 GB Memory
    -> Scenario 3: VM sized with 40vCPU and NUMA.PreferHT and 768 GB Memory (which afaik would not really make sense)

    I am not 100% sure if scenario 2 would be beneficial compared to scenario 1, since I would expect higher CPU-ready values on some of the vCPUs compared to Scenario 1.

    Reply
  2. Vas Mitra Post author

    Hi Fabian
    The scenarios in my blog are independent of whether the VM vCPU count is smaller (non-wide VM) or larger (wide VM) than a NUMA node. For the purposes of sizing estimates we assume linear scalability as the VM vCPU count scales out from being a non-wide to a wide VM. (in reality it may not be perfectly linear).

    In your Scenarios/Assumption: Scenario 2 should give the best performance. We have found with SAP Netweaver and HANA workloads using the extra threads does provide a performance boost (plus higher CPU ready), but this boost may be around 15%. The question is if u need the extra performance boost. You can start with scenario 2 but if after monitoring you see that even at peak workload , CPU utilization of the VM is still not very high then you can consider reconfiguring the VM to scenario 1.

    Rgds

    Vas

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*