Memory metrics for Virtual Machines have changed in recent releases of vRealize Operations to make managing your SDDC better. In this blog, I will explain these changes to help you understand how they help you. To start things off, let’s define a couple of important metrics.
By definition, Active Memory is the “amount of memory that is actively used, as estimated by VMkernel based on recently touched memory pages.” Since a virtual machine isn’t constantly touching every memory page, this metric essentially represents how aggressive the virtual machine is using the memory allocated to it. What this means is that memory utilization, as seen from within the guest OS via Task Manager or top, will almost always be greater than Active Memory. Refer to Understanding vSphere Active Memory if you want to read a more detailed explanation of Active Memory. In vRealize Operations, this metric is called Memory|Non Zero Active (KB).
By definition, Consumed Memory is the “amount of host physical memory consumed by a virtual machine, host, or cluster.” That article also states that “VM consumed memory = memory granted – memory saved due to memory sharing.“ What this means is Consumed Memory can include memory that the guest OS considers free. If you compare Task Manager or top to Consumed Memory, you will see that Consumed Memory is almost always larger. In vRealize Operations, this metric is called Memory|Consumed (KB).
Here is a screenshot comparing Task Manager with Memory|Non Zero Active (KB) and Memory|Consumed (KB).
Now that we know what Active and Consumed memory are and how they relate to what the guest OS shows, it’s time for a short history lesson of vRealize Operations (and you thought you were done with history after high school).
vRealize Operations 6.6.1 and Older
vRealize Operations 6.6.1 and earlier relied on Active Memory when calculating utilization and demand. This meant that memory utilization always appeared lower than what you see in the guest OS. Active Memory was used by the capacity engine, which meant that sizing recommendations were also based on Active Memory. What this meant was the recommendations were usually quite aggressive.
vRealize Operations 6.3
The release of vRealize Operations 6.3 brought support for collecting in-guest metrics via VMware Tools. These metrics weren’t used by any vRealize Operations content (yet), but they were available for you to use. This was an awesome addition because it gave additional visibility into the guest’s perspective without needing another agent. As you can see from the list of metrics below, this meant memory utilization was now available. Note that not all these metrics are collected by default, but you can enable the one you need using policies.
Guest metrics added in vRealize Operations 6.3:
- Guest|Active File Cache Memory (KB)
- Guest|Context Swap Rate per second
- Guest|Free Memory (KB)
- Guest|Huge Page Size (KB)
- Guest|Needed Memory (KB)
- Guest|Page In Rate per second
- Guest|Page Out Rate per second
- Guest|Page Size (KB)
- Guest|Physically Usable Memory (KB)
- Guest|Remaning Swap Space (KB)
- Guest|Total Huge Pages
vRealize Operations 6.7
The release of vRealize Operations 6.7 was a milestone release because it really helped improve usability and simplify how you use the product. There are a few critical changes related to memory monitoring, such as a brand-new capacity engine and the elimination of redundant and unnecessary metrics. The most important change, related to memory metrics, is it utilizes the Guest|Needed Memory (KB) metric, which is collected via VMware Tools that was added in vRealize Operations 6.3. This change was made to greatly improve the quality of projections from the capacity engine as well as rightsizing.
There are some situations where the guest memory metrics can’t make it to vRealize Operations such as VMware Tools not being installed, running an unsupported version, etc. Knowing that the data may not always be available, Consumed Memory is used for failback. Consumed Memory was selected as the failback metric because, as shown above, it’s more conservative than Active Memory. The primary metrics affected by these changes are Memory|Usage (%) and Memory|Utilization (KB).
Typically, you would see that Guest|Needed Memory (KB) and Memory|Utilization (KB) are nearly identical (unless there is an issue collecting the metric from VMware Tools). If there is an issue collecting Guest|Needed Memory (KB), you will see that it correlates with Memory|Consumed (KB) instead.
Memory|Utilization (KB) is the metric used by the capacity engine and therefore rightsizing recommendations. As you can see, it’s advantageous to ensure that Guest|Needed Memory (KB) is collecting from VMware Tools to get the best quality recommendations.
By now, I’m sure you’re wondering about the actual formula used. If guest metrics from VMware Tools are collecting, Memory|Utilization (KB) = Guest|Needed Memory (KB) + ( Guest|Page In Rate per second * Guest|Page Size (KB) ) + Memory|Total Capacity (KB) – Guest|Physically Usable Memory (KB). If guest metrics from VMware Tools are not collecting, Memory|Utilization (KB) = Memory|Consumed (KB).
vRealize Operations 7.0, 7.5, 8.0, and 8.1
With the release of vRealize Operations 7.0, there was a tweak made to Memory|Usage (%) based on customer feedback. Memory|Usage (%) was changed to prefer Guest|Needed Memory (KB) from VMware Tools, but it now fails back to Memory|Non Zero Active (KB) if it’s not available. This change allows you to use Memory|Usage (%) to show an aggressive percentage and Memory|Workload (%) to show a conservative percentage in dashboards and reports.
Memory|Utilization (KB) remains unchanged from vRealize Operations 6.7. Memory|Utilization (KB) is still the metric used by the capacity engine and rightsizing recommendations. Again, it’s important to ensure that Guest|Needed Memory (KB) is collecting from VMware Tools to get the best quality recommendations from vRealize Operations.
Now that you know the history, I’m sure you’re wondering how to ensure it’s working optimally. As you can see, there are many components needed for the feature to work. It’s important to ensure each of these requirements are met. For more information on the requirements, refer to KB 55675.
- vCenter Server 7.0, vCenter Server 6.5 U3, vCenter Server 6.7 U3, or newer
- ESXi 6.0 U1 or newer
- Ensure the vRealize Operations Manager VMware vSphere adapter credentials have the Performance > Modify intervals privilege enabled in the target vCenter(s). See Minimum Collection User Permissions in vRealize Operations Manager 6.x and later for more information.
- VMware Tools 10.3.2 Build 10338 or newer for Windows
- VMware Tools 9.10.5 Build 9541 or newer for 64-bit Linux
- VMware Tools 10.3.5 Build 10341 or newer for 32-bit Linux
- Older versions of vCenter may require disconnecting and reconnecting the host from vCenter as mentioned in KB 55675.
I realize the list of requirements is long and it can be challenging to track in large environments. To help, I’ve created a dashboard to help you identify VMs that aren’t collecting memory from the guest OS. You can find the dashboard along with install instructions on the vRealize Operations Dashboard Sample Exchange site.
Here’s to better managing your memory and your capacity!
Edit (Febuary 6, 2019): Added Performance > Modify intervals privilege requirement for VMware vSphere adapter credentials.
Edit (July 11, 2019): Added vCenter Server 6.5 U3 to Validation section as not needing the workaround mentioned in KB 55675.
Edit (September 12, 2019): Added vCenter 6.7 U3 to Validation section as not needing the workaround mentioned in KB 55675.
Edit (December 11, 2019): Added vRealize Operations 8.0.
Edit (April 15, 2020): Added vRealize Operations 8.1.
Edit (April 30, 2020): Added vCenter Server 7.0
5 comments have been added so far
Thank you very much for your recent article. This is great since it clarifies many problems about memory metrics.
Hypervisor metrics vs guest OS metrics and their evolution thru vROps versions from v6.x to v7.0.
Your article clarified all the findings that I could not understand but see by observing memory data on more than a thousand of VMs that we’re monitoring with vROps over the past 3 years.
We went thru vROps versions 6.1, 6.3, 6.5, 6.7 and finally 7.0.
I saw the memory metrics dramatic changes in values over the vROps versions which also impacted the alert thresholds therefore the number of memory alerts !
On Linux VM’s, we use the “Guest|Remaining Swap Space(KB)” to trigger memory alerts at Guest OS level.
This works very well according to our Linux OS system engineers and based on my 10 years of monitoring experience also with other monitoring tools.
On Windows VM’s, can you please kindly help us to choose the right metric in order to trigger Guest OS memory alerts ?
I can not rely on “Guest|Free Memory(KB)” since often it is near 0 but the memory usage(%) shows only 50% or even less.
The ratio between 2 metrics “Guest|Physically Usable Memory(KB)” and “Guest|Needed Memory(KB)” seems to show the real usage at OS level.
Now with vROps v7, when we have VMtools version 10338, our Windows VM’s show this ratio as the metric “Memory|Usage(%)” (or Workload).
Do you advise that I use this “Memory|Usage(%)” metric to trigger memory alerts on our windows VM’s
or it is safer to make a super metric based 100% on guest metrics, I mean ratio between “Guest|Physically Usable Memory(KB)” and “Guest|Needed Memory(KB)”.
Thank you so much for your very important article which helped me a lot and your advices.
Thanks for the feedback Mehmet. I’d recommend looking at Memory|Workload (%). The difference between Memory|Usage (%) and Memory|Workload (%) is that the former fails back to active memory and the later fails back to consumed if there are issues collecting data from the guest OS via VMware Tools. So, I would not recommend Memory|Usage (%) because it will nearly always be lower than what the guest OS shows. If you choose to use Memory|Workload (%), you can leverage the existing symptoms “Virtual machine memory workload is at Warning level”, “Virtual machine memory workload is at Immediate level”, and “Virtual machine memory workload is at Critical level” for Virtual Machine objects in your alert. Lastly, if you want to limit the alert to only VMs that are collecting guest memory metrics from VMware Tools, you add a symptom to the alert that requires Guest|Needed Memory (KB) to be greater than 0.
Thank you very much for your answer and kind help.
I checked the Memory|Workload(%) value in vROps and what we see on the VMs Guest OS memory metrics: the values are correct and correspond.
Your help and advices are much appreciated.
So I’m currently on vROps 6.7 and have created the dashboard for Guest Needed Memory Not Collecting. I made sure all the requirements are met in order to get the VMware Tools to report the guest needed memory. The final requirement of disconnecting/reconnecting the hosts fixed the issue of the guest needed memory not collecting. However, I found out that when I put hosts in maintenance mode and take them out of maintenance mode, the guest needed memory is not collecting again. What a pain! Will upgrading to vROps 7.0 (or the latest release) fix this issue? I am most concerned with the oversized VMs report now showing that there’s not much to reclaim memory-wise.
Unfortunately, as you’ve noticed, you may need to perform the disconnect/reconnect if the issue returns. The dashboard I mention in the blog post has alerts and a vRO workflow that you can use to completely automate the workaround. Unfortunately, upgrading vRealize Operations will not fix the issue. The fix, once available, will require an update to vCenter, not vRealize Operations.