Hyper-threading Impact on Virtual SAP Sizing and Performance – Part 1 of 2

This is part 1 of 2 blogs that will cover how hyper-threading impacts virtual SAP sizing and performance. Many virtual SAP deployments leverage INTEL’s hyper-threading (HT) technology. For each processor core that is physically present, the hypervisor sees two logical processors and shares the workload between them when possible. A vCPU can be scheduled on a logical processor on a core while the other logical processor of the core is idle. In this blog this is referred to as one vCPU scheduled per core. Two vCPUs can be scheduled on the two logical processors of the same core. This is referred to as two vCPUs scheduled per core. For more background on vSphere scheduling functionality, please see the whitepaper The CPU Scheduler in VMware vSphere .

I will show three different sizing scenarios.

Scenario 1

The first scenario above shows

14 physical cores with HT enabled (28 logical CPUs).
A virtual machine (VM) with 14 vCPUs.
vSphere will schedule each vCPU on a logical CPU on a separate dedicated physical core (default behavior). The scheduler prefers a whole idle core, where both logical CPUs of the core are idle, over a partial idle core, where one logical CPU is idle while the other is busy.
There is spare capacity for more performance as not all the logical CPUs are utilized.

Scenario 2

The scenario above shows:

A virtual machine with 28 vCPUs.
vSphere schedules the vCPUs across all the logical CPUs – the two logical CPUs of each physical core are both utilized. This can be achieved a number of ways:
- Setting manual CPU affinity in the VM to force the vCPUs to be scheduled on specific logical CPUs.
- Provisioning number of vCPUs greater than number of cores on the host.
- Deploying a VM with twice the number of vCPUs as cores in a socket and setting the VM level parameter “Numa.PreferHT” to true . All the vCPUs will be scheduled across all the logical CPUs within the socket/NUMA node.
Utilization of all the logical CPUs in Scenario 2 provides on average 15% boost in SAP performance/transaction throughput compared to scenario 1. In SAP sizing transaction throughput and performance are measured in the metric “SAPS”. So scenario 2 provides about 15% more SAPS than scenario 1.

Scenario 3

This scenario shows:

16 physical cores with HT enabled (32 logical CPUs)
A virtual machine with 16 vCPUs. vSphere will schedule each vCPU on a logical CPU on a separate dedicated physical core (default behavior) – same as Scenario 1.
The performance/SAPS throughput is approximately the same as Scenario 2 (based on 15% HT benefit).
- As we linearly scale up vCPUs and cores in Scenario 1, adding an extra 15% vCPU (and cores) will provide us equivalent performance to Scenario 2.
- Scaling up vCPUs in Scenario 1 by 15% = 1.15 x 14 ≈ 16 vCPUs (on 16 cores) – this is Scenario 3.

Comparing Scenarios

SAP sizing involves calculations in SAPS. You can see an example at https://blogs.vmware.com/apps/2017/06/awg_s4hana_part1.html#more-2217 . The methodology and example shown here enables you to calculate the number of vCPUs required for business requirements provided in SAPS. You then have the option to design the VMs like Scenario 2 or 3:

If we need 16 vCPUs on 16 cores (Scenario 3) an alternative configuration with less cores and equivalent SAPS performance is Scenario 2 (28 vCPUs on 14 cores). The calculation is: 16 / 1.15 ≈ 14 i.e.

M = # of cores utilized (either 2 vCPUs or 1 vCPU scheduled per core)

SAPS of [M cores with 1 vCPU per core] = SAPS of [ M/1.15 cores with 2 vCPUs per core]

If we need 28 vCPUs on 14 cores (Scenario 2) an alternative configuration with equivalent SAPS with less vCPUs but more cores is Scenario 3 (16 vCPUs on 16 cores). The calculation is: 14 x 1.15 ~ 16 i.e.

SAPS of [ M cores with 2 vCPUs per core] = SAPS of [M x 1.15 cores with 1 vCPU per core]

The above equations are estimates as we assume linear scalability of SAPS with vCPUs in all the scenarios and an average HT benefit of 15%.

Conclusion

I have shown above when sizing VMs we have the option to configure the VMs with 1 vCPU scheduled per core or 2 x vCPUs scheduled per core. An equation shows how these options are numerically related. The following table summarizes the difference between the options.