Cloud Management Platform vRealize Operations vRealize Suite

A vRealize Operations Super Metric for VM NUMA Optimization

In the mad rush leading up to VMworld, I saw a cool new fling by Mark Achtemichuk and Mark McGill that provides a report for optimizing your VM CPU configuration for NUMA boundaries. This is something we get a lot of questions about from customers and they want to know if vRealize Operations can provide this information. Combining the type of information Mark’s fling provides with some right-sizing information from vRealize Operations can be used to hyper-size your virtual machines so that they not only are configured with the correct amount of CPU and memory, but they are also performing optimally given the host NUMA configuration.

At first, I thought I would figure out a way to bring the data from the Powershell script into vRealize Operations via the REST APIs but then I thought, “Hey, super metrics!” and the result is what I believe to be the longest super metric ever written! Well, I don’t know that for sure, but it’s long and I’ll break it down for you here.

Algorithmic Approach

I started out thinking that I would write a series of shorter super metrics, which computed the values I needed for the optimal sockets and cores. However, I decided to do this with as few super metrics as possible and ended up with just two. While this makes the formulas complex, it has the advantage that you don’t have a lot of super metrics per VM that are basically placeholders and could impact your cluster sizing in the long run.

Since I needed to write a very lengthy super metric, I built a logical flow chart to guide me. This flow is based on the work done by Mark on the fling as well as his fantastic blog on NUMA optimization.

The “if-then” logic in the flow requires use of the ternary operator in the super metric formula. Basically, this allows you to evaluate a value to determine if it is true or false and then provide a result based on that evaluation.

The basic ternary operator is:

value ? result_if_true : result_if_false

But what is cool about ternary operator is that you can nest other operations, including other ternary operators, inside them! This allowed me to basically consider the full logic tree above since the result of the formula is a single value.

Although I was able to get the entire logic flow into a single super metric, the solution includes two super metrics. One returns the value for optimum number of sockets. The other provides the optimum number of cores per socket. Both workflows have the same core logic workflow above but the optimal cores per socket captures the optimal sockets value and performs one more calculation.

Challenges and Caveats

Of course, using super metrics in this way, while powerful, is challenging. When you start to get nested ternary operators things start to run together and it’s hard to keep up with matching brackets and parenthesis. Several times, I found myself copying the super metric into Atom so I could apply some visual structure with tabs and line breaks.

The super metric editor is fine for shorter formulas but this beast presented some challenges!

I spent most of my time correcting mismatch issues in my formula and this approach at least let me divide things up into chunks that showed the logical flow.

Testing and Validating

I enlisted the help of my peer, Brandon Gordon, to help test the results. Brandon created a view that shows the current and optimal CPU configurations as well as recommended sizing for the virtual machine, giving that “all in one” hyper-optimization result.

I was able export that view as a CSV and compare it to the result I got from the fling. The results were spot on!

With one exception, that is… if you happen to have single core sockets as hosts, you’ll get some strange results. I debated putting in logic to overcome this but honestly, who does that? We only ran into that in our testing because some of our nested hosts (virtualized ESXi) are configured that way.

Check It Out

By all means, if you have not seen the fling yet, go and get it. It should be considered the authoritative approach and everything in the super metric is borrowed from that work.

If you would like to grab the super metric, you can find it here (along with other great samples for other super metrics, dashboards and vRealze Orchestrator workflows for vRealize Operations).


Check out our vRealize Operations evaluation here, and take our vRealize Operations Hands-On Lab here!


2 comments have been added so far

  1. Works great. I would recommend adding one more thing to check and that is CPU Hotplug feature on/off as that having that on effectively disables vNUMA and makes any optimizations on 2 and more vSocket VMs ineffective.

    1. Great suggestion! I think this needs a companion view or dashboard and that way I can include your recommendation as well as other items of interest (recommended size, etc).

Leave a Reply

Your email address will not be published. Required fields are marked *