Disable ballooning?

During Partner Exchange I've had multiple discussions about disabling ballooning, specifically about the recommendation of disabling ballooning when running particular workloads such as SQL and Oracle. The goal of this recommendation usually is to stop the VMkernel from reclaiming memory but unfortunately this will not happen. This article describes why ballooning is helpful and how to achieve your goals by utilizing other resource management settings.

Let’s stress the most important bit immediately: Disable the ballooning mechanism does not disable memory reclamation. It will just disable the most intelligent mechanism of the entire memory management stack.

Why is disabling the ballooning mechanism bad?
Many organizations that deploy virtual infrastructures rely on memory over-commitment to achieve higher consolidation ratios and higher memory utilization. In a typical virtual infrastructure not every virtual machine is actively using its assigned memory at the same time and not every virtual machine is making use of its configured memory footprint.

To allow memory over-commitment, the VMkernel uses different virtual machine memory reclamation mechanisms:

Transparent Page Sharing
Ballooning
Memory compression
Host swapping

Except from Transparent Page Sharing, all memory reclamation techniques only become active when the ESXi host experiences memory contention. The VMkernel will use a specific memory reclamation technique depending on the level of the host free memory. When the ESXi host has 6% or less free memory available it will use the balloon driver to reclaim idle memory from virtual machines. The VMkernel selects the virtual machines with the largest amounts of idle memory (detected by the idle memory tax process) and will ask the virtual machine to select idle memory pages.

To fully understand the beauty of the balloon driver, it’s crucial to understand that the VMkernel is not aware of the Guest OS internal memory management mechanisms. Guest OS’s commonly use an allocated memory list and a free memory list. When a guest OS makes a request for a page, the VMkernel will back that “virtual” page with physical memory. When the guest OS stops using the page internally, it does not remove the data, the guest OS just removes the address space pointer from the allocated memory list and places this pointer on the free memory list. Because the data itself has not changed, ESX will remain keeping this data in physical memory.

When the Balloon driver is utilized, the balloon driver request the guest OS to allocated a certain amount of pages. Typically the guest OS will allocate memory that has been idle or registered in the guest OS free list. If the virtual machine has enough idle pages no guest-level paging or even worse kernel level paging is necessary. Scott Drummonds tested an Oracle database VM against an OLTP load generation tool and researched the (lack of) impact of the balloon driver on the performance of the virtual machine. The results are displayed in this image:

Impact on performance: Ballooning versus swapping

Scott’s explanation:
Results of two experiments are shown on this graph: in one memory is reclaimed only through ballooning and in the other memory is reclaimed only through host swapping. The bars show the amount of memory reclaimed by ESX and the line shows the workload performance. The steadily falling green line reveals a predictable deterioration of performance due to host swapping. The red line demonstrates that as the balloon driver inflates, kernel compile performance is unchanged.

So the beauty of ballooning lies in the fact that it allows the guest OS itself to make the hard decision about which pages to be paged out without the hypervisor’s involvement. Because the guest OS is fully aware of the memory state, the virtual machine will keep on performing as long as it has idle or free pages.

When ballooning is disabled
When we follow the recommendations of disabling the balloon driver the VMkernel can use the following memory reclamation techniques:

Transparent Page Sharing
Memory compression
Host-level swapping (.vswp)

Memory compression
Memory compression was introduced in vSphere 4.1. The VMkernel will always try to compress memory before swapping. This feature is very helpful and a lot faster than swapping. However, the VMkernel will only compress a memory page if it can reach a compression ratio of 50% or more, otherwise the page will be swapped. Furthermore, the default size of the compression cache is 10%, if the compression cache is full, one compressed page must be replaced in order to make room for a new page. The older pages will be swapped out. This means that during heavy contention memory compression will become the first stop before ultimately ending up as a swapped page.

Increasing the memory compression cache can have a contradictive effect, as the memory compression cache is a part of the virtual machine memory usage, it can introduce memory pressure or contention due to configuring large memory compression caches.

Host-level Swapping
Contrary to ballooning, host-level swapping does not communicate with the guest OS. The VMkernel has no knowledge about the status of the page in the guest OS only that the physical page belongs to a specific virtual machine. Because the VMkernel is unaware of the content of the stored data inside the page and its significance to the guest OS, it could happen that the VMkernel decides to swap out guest OS kernel pages. The guest OS will never swap kernel pages as they are crucial to maintaining kernel performance.

By disabling ballooning, you have just deactivated the most intelligent memory reclamation technique. Leaving the VMkernel with the option to either compress a memory page or just rip out complete random (crucial) pages, significantly increasing the possibility of deteriorating the virtual machine performance. Which to me does not sound something worth recommending.

How to guarantee performance without disabling the balloon driver?
The best option to guarantee performance is to use the resource allocation settings; shares and reservations.

Shares:

Use shares to define priority levels and use reservations to guarantee physical resources even when the VMkernel is experiencing resource contention.

Reservation
A reservation specifies the guaranteed minimum allocation for a virtual machine. This means that the VMkernel does not reclaim physical memory if it is protected by a reservation, even if there is contention. This physical memory will be available to that specific virtual machine at all times. In essence, by appling memory reservation to a virtual machine, you are disabling memory reclamation for that chunk of virtual machine memory.

Additional information about reservations:

However setting reservations will impact the virtual infrastructure, a well know impact of setting a reservation is on the HA slot size if the cluster is configured with “Host failures cluster tolerates”. More info on HA can be found in the HA deep dive on yellow-bricks. To circumvent this impact one might choose to configure the HA cluster with the HA policy “Percentage of cluster resources reserved as fail over spare capacity”. Due to the HA-DRS integration introduced in vSphere 4.1 the main caveat of dealing with defragmented clusters is dissolved.

Conclusion
Disabling the balloon-driver will likely worsen the performance of the virtual machine and drives the problem down the stack. If you want to disable memory reclamation for that virtual machine, apply reservations.

Related Posts:

Related Articles

VMware vSphere Foundation: Optimizing Private Clouds and Driving IT Value

Embracing Change with VMware vSphere Foundation

Announcing New Collaborations in VMware Private AI