Product Announcements

Memory Performance Counters – An Evolved Look at Memory Management

vSphere memory management has evolved over the years taking advantage of new technologies and techniques like using large memory pages, compressing memory, and using solid state devices (SSDs) for swap caching. This evolution has changed the way we need to look at memory usage and memory over-commitment in vSphere virtualized environments. To understand memory management and usage in vSphere we need to understand the metrics available to us, what they mean, and how to use them.

When monitoring performance, just knowing what performance metrics are available and what they measure is half the battle. A good place to see the performance metrics available in vSphere and their definition is in the vSphere SDK documentation for the PerformanceManager object.

Some of the key memory performance and capacity planning metrics available in vSphere are Active, Consumed, Granted and Entitlement.

Key Memory Metrics for a Virtual Machine (full list available here)

Mem.Active

Amount of guest “physical” memory actively used, as estimated by VMkernel based on recently touched memory pages.

Mem.Consumed

Amount of guest physical memory consumed by the virtual machine for guest memory. Consumed memory does not include overhead memory. It includes shared memory and memory that might be reserved, but not actually used. 

Virtual machine consumed memory = memory granted – memory saved due to memory sharing

Mem.Granted

Guest “physical” memory that is mapped to machine memory. Includes shared memory amount. Does not include overhead.

Mem.Entitlement

Amount of host physical memory the virtual machine is entitled to, as determined by the ESX scheduler.

These counters have mostly been around for years, but some counters are more important than others these days due to changes in how vSphere uses memory. Since vSphere 4.0, vSphere backs all virtual machine memory with large pages. This prevents the vSphere Transparent Page Sharing (TPS) memory reclamation techniques, which only operates on small pages, from kicking in as early as it had prior to vSphere 4.0. TPS will still kick in, in vSphere 4.0 and later, but now it will only kick-in once memory pressure occurs and vSphere needs to actively reclaim memory.

This change in behavior has caused the mem.consumed metric to become less important in capacity and memory over-commit planning. These days the memory metric mem.Active is more useful for capacity planning. That is because we are more concerned with how much memory the VM actively needs and is using, than how much it has consumed at some point in time. From a performance standpoint as long as the virtual machine’s actively used memory is in physical memory the virtual machine should perform well.
vSphere also has several counters that measure and report on the impact of memory over-commit.

Memory Over-Commit / Performance Counters

Mem.Latency

Percent of time VM is blocked waiting to access swapped,
compressed or ballooned memory

The larger the number, the larger the impact on VM performance

Cpu.SwapWait

Time the virtual machine is waiting for swap page-ins.

Mem.VmMemCtl

Amount of memory allocated to the VMware balloon driver in the VM.

Mem.SwapInRate

Rate at which memory is swapped from disk into active
memory during the interval.

Mem.Swapped

Current amount of guest physical memory swapped out to the virtual machine’s swap file by the VMkernel. Swapped memory stays on disk until the virtual machine needs it. This statistic refers to VMkernel swapping and not to
guest OS swapping.

In the past we have used the various swap metrics (particularly mem.SwapInRate) to detect memory performance issues. The idea was that any swap-in activity either at the guest or ESXi host would have a very high impact to virtual machine performance. But with techniques like memory compression and host cache (swap to SSD), it has become less important “that we are swapping”, and more important in “how long does it take to swap”.

Because using SSDs as a swap cache location and utilizing memory compression can greatly reduce the amount of time it takes to swap memory, vSphere counters like mem.latency or cpu.swapwait have become more important than SwapInRate. Mem.latency and CPU.SwapWait measures how long the VM was blocked because it was waiting for required memory to be swapped back in to physical memory. Both these counters more clearly show the performance impact of swapping than a simple swap in page rate.

Metrics like Mem.Consumed and Mem.SwapInRate might have been good metrics to use in the past, but today they are not the best metrics when trying to determine what is truly a performance concern.  Today, using metrics like Mem.Active and Mem.Latency can more accurately report memory utilization/capacity and performance.

If you attended VMworld 2012 – US or plan on attending VMworld 2012 – Europe, you can listen to the recording of Session INF-VSP1729 “Understanding Virtualized Memory Performance Management” to get a deeper dive on vSphere memory management. Several other VMworld 2012 presentations are also available online to VMworld attendees (VMworld 2012 Online Sessions) with both slides and audio recording.