Home > Blogs > Virtual Reality


Memory Overcommit – Real life examples from VMware customers

Memory overcommit, aka the ability of VMware ESX to provision memory to virtual machines in excess of what is physically available on a host, has been a topic of discussion in virtualization blogs for quite some time (e.g., “More on Memory Overcommitment”) and apparently still is (e.g. VMware vs. Microsoft: Why Memory Overcommitment is Useful in Production and Why Microsoft Denies it and “Microsoft responds to VMware’s ability to overcommit memory” ).

Given the benefits of memory overcommit and the fact that today only VMware ESX/ESXi has it as a standard feature, it is understandable that other vendors try to downplay it by advocating that it is irrelevant, dangerous and not used in production environments. Microsoft’s position on the topic is particularly interesting….or confusing, I should say. On one side, in an interview Bob Muglia, Microsoft VP, confirmed the usefulness of memory overcommit, announcing plans to add it to their hypervisor some time in the future (have you heard this line before?), but then on the other side they don’t miss an opportunity to speak against it. James O’Neill, also from Microsoft, in his blog even challenged us to provide a reference of a customer that is actually using it in production, promising in return to make a charitable donation of $270 to an organization of our choice.

Anyway, internally at VMware we certainly have no doubts about the importance and effectiveness of memory overcommit, but we felt that after all this discussion among vendors, and after all the confusion from Microsoft as to whether it is/isn’t important and is/isn’t on the Hyper-V roadmap, that it might be more interesting for you to hear directly from our customers. Therefore, the bulk of this post will document a survey of memory overcommit usage among VMware customers. You’ll hear directly from VMware users regarding how they leverage ESX memory overcommit in their production datacenters, with no impact to performance, to increase VM density and further reduce VMware’s already low cost per application – the most relevant metric of virtualization TCO.

(Side Note: I bet MSFT will no longer question the value of overcommit once they are finally able to list it as an upcoming Hyper-V feature),

Before jumping into the survey results, I think a few clarifications are necessary.

What is memory overcommit?

Here I won’t go into all the granular, technical details of how memory overcommit works, because there is already a ton of great literature available that explains what it is and how it works (e.g., “The Role of Memory in VMware ESX Server 3” ).

However, there are a couple points that I’d like to make regarding the functionality of and requirements for Memory Overcommit.

Memory Overcommit: Required Components

Memory overcommit is the combination of three key ingredients:

  1. Transparent memory page sharing
  2. Balloon driver
  3. Optimized algorithms in the hypervisor kernel

These 3 elements must all be present and work together seamlessly. One alone is not enough , unlike what some vendors would like people to believe (see Ballooning is more than enough to do memory overcommit on Xen, Oracle says). To date, only VMware ESX has all the necessary components, has had them since 2001, and has continued to improve them ever since.

Memory Overcommit: Security Impact

Transparent memory page sharing de-duplicates memory pages by sharing the identical pages among VMs. In doing so, it makes the shared pages “read-only” at the physical RAM level. If the VM tries to write to it, ESX will get a callback and it will create a private copy of the page for the VM that wants to write to it, while letting the other VMs continue to use the original shared page. Marking it read-only ensures that it is a secure technology, one VM won’t be able to affect any other VMs. However, if you need additional assurance of Memory Overcommit’s security, you should keep in mind that VMware ESX, with its Memory Page Sharing feature, is the only hypervisor in the market that has earned a Common Criteria Evaluation Assurance Level 4 (EAL4+) under the CSEC Common Criteria Evaluation and Certification Scheme (CCS). Therefore, only VMware ESX is approved for use in “sensitive, government computing environments that demand the strictest security.”

Why is memory overcommit important?

Memory overcommit enables customers to achieve higher VM density per virtual host, increasing consolidation ratios and providing a more efficient scale up – scale out model. Ultimately this translates into substantial savings and a lower cost per application than with alternative solutions, as Eric Horschman shows in his blog post.

While the declining cost of memory could suggest that hypervisors with no memory overcommit can get away without it, in reality throwing more memory at the problem is not a sustainable solution for a few reasons:

  • The number of VMs deployed grows over time

  • Going forward systems will be even more memory constrained than today as the number of CPU cores per server will increase considerably faster than memory capacity. As a matter of fact, in 2011 a two sockets system is expected to be capable of 64 logical CPUs and 256GB of RAM, whereas today the same system is probably capable of 8 logical CPUs and 64GB of RAM. This means that the ability of a hypervisor to efficiently manage memory will be an even more critical factor to minimize the number of servers required to run applications and ensure efficient scalability.

  • Memory capacity requirements aren’t determined only by application workloads, but also by a number of valuable IT services, such as: high availability, zero downtime system maintenance, power management and rapid system provisioning. Virtualization solutions that don’t allow memory overcommit corner customers into a lose-lose situation: either reduce system utilization or don’t provide the service. Thanks to memory overcommit, our customers tell us that they were able to reduce their dependence on available physical resources, avoid unnecessary purchases, and improve infrastructure utilization. (see below for few examples on how VMware customers use memory overcommit)

Enough with the clarifications – let’s move on to the customer survey ….

We conducted an online survey of 110 VMware customers essentially asking them three questions:

  1. Do you use memory overcommit?
  2. Do you use memory overcommit in test/dev, production or both?
  3. What is your virtual-to-physical memory ratio per ESX host (i.e., overcommit ratio)?

Here are the results:

1) 57% answered they are using memory overcommit

yes-no

……so much for “nobody uses it”

2) Of the 57% who answered yes, 87% said they use it in production and test and development, 2% only in production, 11% only in test/dev

prod-test

……so much for “nobody uses it in production”

3) Finally, plotting the virtual-to-physical ratios on a chart, we can see what usage looks like. Virtual-to-physical memory ratios ranged from 1.14 to 3 (average 1.8, median 1.75). 75% of the respondents use memory overcommit ratios of 1.5 or higher and 37% utilize a ratio of 2 or higher

USAGE

.…..so much for “memory overcommit ratios must be low”

What the chart can’t show is that, based on our findings, companies at the low end of the memory overcommit usage spectrum tend to be recent VMware customers, while those at the high-end tend to be long standing VMware customers. This looks very similar to what we have seen happening with other VMware technologies such as VMotion: once people try it and they see how well it works, they want to extract its potential.

I believe this data clearly demonstrates that VMware customers use memory overcommit in production systems and do so with high virtual-to-physical ratios.

Finally, here is what few customers who use memory overcommit in production have to say about it:

Kadlec Medical Center – Large 188-bed hospital in southern Washington State with over 270 medical staff members and over 10,000 annual patient admissions.

“Memory overcommit is one of the unique and powerful features of VMware ESX that we leverage everyday in our production environments. Thanks to memory overcommit, we were able to increase the consolidation of production environment by over 50%, maximizing utilization without giving up on the performance of our production systems. We appreciate that VMware makes it available to customers as a standard feature of ESX” – Tim Harper, Sr. System Analyst, Kadlec Medical Center

WTC Communications – regional phone, cable, Internet provider in Kansas

"A small business like ours derived tremendous benefits from the ability of VMware ESX to overcommit memory. We cannot afford the big IT budget of a large enterprise, so we must get the most out of our production servers while guaranteeing SLAs with our customers. This is exactly what VMware ESX memory overcommit allowed us to achieve. We were able to consolidate 35 production virtual machines (both Linux and Windows) on just 3 Dell PowerEdge 2850 servers with 8GB RAM each. Typically we run our production servers at an average ratio of 1.25 virtual-to-physical memory, however during maintenance operations, the ratio increases to 1.88 as we VMotion VMs out of the host that undergoes maintenance completely transparently to the users. Memory overcommit adds unparallel flexibility to our infrastructure and saves us a lot of money not just by allowing higher consolidation, but also by eliminating the need for spare capacity to perform routine maintenance operations. Memory overcommit is a fully automated feature of ESX and it is extremely simple to use. It is really a no brainier.” — Jim Jones, Network Administrator, WTC Communications

U.S. Department of Energy – Savannah River

"Our virtualization effort began 4 years ago, and we have made great strides in server utilization since then. After upgrading to VI3, we took advantage of VMware memory overcommit. We now routinely overcommit memory at a 2:1 ratio in our production environments and have even reached 3:1 on occasion. We even run large applications such as Lotus Domino and SQL server 2008 in VMs but this has not been an issue – no performance impact. As a result, we fully trust VMware memory overcommit in our production environments. Our IT budget is tight so in the past we have had to wait over 6 months to receive a new server. By using memory over commit, we can now deploy a system in less than 30 minutes without waiting for a new server. This keeps our internal customers very happy," – Joseph Collins, Senior Systems Engineer, U.S. Department of Energy – Savannah River

10 thoughts on “Memory Overcommit – Real life examples from VMware customers

  1. Channel-V

    Hi Alberto,
    A nice article, but there is one thing that botters me.
    The part on Bob Muglia and his interview with Steven Bink on whether or not Microsoft will integrate ‘overcommit’ into the solution.
    I have read the article from Steven and Bob Muglia responds “we definitely need to integrate that”, but he responds to a question on “shared memory”.
    Shared memory is in my opinion not the same as vercommitting, as sharing memory does not me mean you will be overcommitting.
    What is your opinion when it comes to comparing “shared memory” to “memory overcommit’?
    Regards,

  2. Alberto

    Thanks for the question, Channel-V.
    I agree with you that “shared memory” is not equal to “memory overcommit”. Ultimately, though, you share memory because you want to be able to overcommit. As I wrote, there are several other memory management features necessary to support memory overcommitment. The white paper I point to in the blog gives you all the details. “Shared memory”, or “Transparent Page Sharing” (TPS) as we call it at VMware, is a fundamental piece of the puzzle and one greatly improves the efficiency with which you can overcommit memory.
    Maybe we should simply ask Microsoft to further confirm or deny that overcommit is on the Hyper-V roadmap.

  3. james O'Neill

    I asked VMware to come back with a customer who was in production at the time of a previous post on the subject, and using ratio of 2:1 or higher. At the time no one from VMware could produce such a customer so the money’s gone to the NSPCC – Britain’s leading children’s charity.
    Incidentally I wouldn’t read BobMu’s words quite as literally as you are doing. Yes we hear customers want to be clever with memory, but don’t take that to mean a direct copy of the feature as implemented in VMware.

  4. Andy

    If you read James’ original wager, it/he didn’t specify that the customer had to be in production at the ‘time of the post’ – he just challenged someone to show him a customer who used this featues. “If someone can show me a customer who is running, in production, a VMware VI3 Enterprise system with a 2:1 memory overcommit ratio on all the VMs, where spending the cost of VMware on RAM wouldn’t remove the need to use overcommitment then I’ll give… lets say $270 to their choice of charity.” (http://blogs.technet.com/jamesone/archive/2008/03/13/expensive-hypervisors-a-bad-idea-even-if-you-can-afford-them.aspx)
    The NSPCC donation is a nice touch but I think the outstanding $270 to OLPC is still owing…

  5. Tim Stephan

    Andy – thanks for the clarification, I think you’re right – there was no such restriction! :) But no worries on the bet – our aim wasn’t to make James pay, but to demonstrate that 1) memory overcommit is a safe technology that people are leveraging in their production datacenters and 2) as such, memory overcommit is one reason why VMware solutions offer the lowest cost per application of any virtualization solution – even better than “free” solutions.
    But, it is interesting that James says that we shouldn’t take Bob Muglia quite so literally regarding memory overcommit – hmmm… so James, on what other topics should we not take Bob so literally?? When he says that Hyper-V will have live migration with Windows Server 2008 SP2? Or when he says that Microsoft virtualization solutions supposedly cost 1/3 the price of VMware’s? Or when he said that Hyper-v could somehow achieve I/O throughput four times that of ESX? Anything else? ;)

  6. Andy

    Since he doesn’t appear to want to honour his original wager, perhaps we shouldn’t take Bob or James’ words James quite so literally?!

  7. Steve

    I am an engineer for a leading integrator. We’ve had a few clients intersted in Hyper-V for test/dev environments. This isn’t surprising given Microsoft marketing of current gen Hyper-V as “good enough” virtualisation. Probably not good enough for most prod, but good enough for test/dev.
    Well Microsoft, no it’s not good enough. Memory overcommit is even more useful in test/dev environments where hardware budget is low, but every application owner and his dog want to run up a new VM every week.
    So don’t say no one uses it. My weekend will be spent hacking the ESXi installer to replace Hyper-V on my whitebox purely because I can’t overcommit my memory.

  8. James Russell

    It’s not difficult to use higher overcommit ratios if you allocate more memory than is really needed to each guest in the first place…

  9. Gene

    James R: which is what people typically do when they create VMs. So, are you saying that when people use Hyper-V they should spend lots of time making sure that each VM is allocated just the right amount of RAM, instead of letting ballooning and TPS do the work?

  10. Alberto Farronato

    Gene, great point. To build on it, I’d like to add that VMware memory overcommit isn’t a one-time process that happens just when you deploy a new VM. On the contrary, it is a continuous process that optimizes memory allocation in real time. I believe it is not realistic to think that everything can be planned in advanced or that people will go back to drawing board every time something changes as James seems to suggest. Virtualization (at least the way VMware does it) doesn’t only increase resources utilization, but also the flexibility with which they are used without requiring manual tasks. On the other hand, Hyper-V forces people to operate in a similar way to what they used to do before virtualization when resources were pre-allocated at deployment and changes could be made only by taking an application offline.

Comments are closed.