Latency-sensitive workloads NUMA Performance SAP HANA Virtualization vSphere

NUMA Observer Fling Helps Admins Find Problems after Setting NUMA Affinities

When setting unique logical core and/or NUMA node affinities to achieve low latency—for apps like SAP HANA, other big databases and critical applications—admins could miss overlapping affinities that maintenance and HA events might change:

  • High availability (HA) events could restart VMs on other hosts with spare capacity and those hosts might already be running VMs tweaked for low latency by constraining them to run on the same set of cores/NUMA nodes. This results in multiple VMs constrained and scheduled to the same set of logical cores. Such overlapping affinities could result in CPU contention and/or remote allocation of memory.
  • In some cases, VMs might be restarted on hosts with different configured NUMA node capacities, and this could result in application performance degradation.

The NUMA Observer fling provides a solution to these problems by giving admins a way to identify the most severely impacted VMs and the overlapping affinities causing resource contention.

NUMA observer scans the VM inventory in the vCenter or on individual hosts and identifies VMs with overlapping core/NUMA affinities. The fling also collects statistics on the remote memory usage and CPU starvation of critical VMs. After NUMA Observer collects the relevant info, it generates alerts in its user interface.

The VM Alerts page in NUMA Observer is shown below.

Figure 1. VM Alerts web page for NUMA Observer.

We thank the following members of the SAP field and performance teams for their suggestions and feedback: Erik Rieger, Wolfgang Weith, Sebastian Lenz, Sathya Krishnaswamy, and Wolfram Weber.