Home > Blogs > Virtual Reality


Memory Overcommitment in the Real World

There has been a lot of talk on this blog and others about memory overcommit. Several of the blogs have tried to discredit the power of memory overcommit by saying it’s never used by customers in the real world and can simply be overcome by adding more memory to the server (see the long discussions in previous posts on this blog). One VMware customer has already commented about how he architects his environment with memory overcommit in mind. I received a private response from another VMware customer who is getting ready to implement a very large VDI environment with some interesting numbers that I’ll walk through below. The customer is a large US bank and to protect the security of their environment I did black out some of the names in the screenshots.

About the Environment

This customer configured their standard Windows XP environment for their call centers to run in a virtual machine. Each virtual machine is granted 512 MB of memory and 1 virtual CPU. Each VM runs a series of applications including Marimba, Microsoft Office, a call recording application, a customer database application, and a BPO (business process off-shoring) application.

The host used is an IBM x3850 M2 quad socket, quad core system (total of 16 cores). The server is configured with 64 GB of physical RAM. There are 32 memory sockets in the server.

At the time the system was actively being used by call center employees reflected by this CPU utilization graph.

cpu chart.jpg

Running the Numbers

Below is a screenshot of their environment showing a total of 178 VMs running on the system. You can also see in the screenshot that less than 20 GB of RAM out of the total 64 GB of RAM is being used on the system. With a total of 178 VMs configured for 512 MB of RAM each they are currently allocating 89 GB of memory to running VMs which means they are oversubscribed on the host.

VDI.jpg

Below are screenshots of the price for an identically configured host coming from IBM’s website on March 18, 2008.

VDI Configuration Summary.png

VDI Configuration Memory.png

Total cost for the server is $24,623.00. Remember that this box is overcommitted on memory. In order to run the same setup with a competitive solution we would need to have a server configured with at least 89 GB of RAM – the total allocated to all of the running virtual machines. Going back to IBM’s website and reconfiguring the server with more memory gives us the following screenshots. The closest configuration that supported at least 89 GB of RAM was 92 GB of RAM because of the memory configurations allowed in the server.

VDI Configuration - Competitor.png

VDI Confiuration Memory - Competitor.png

The new cost of the server is $36,423.00. Compared to the original configuration this is a difference of $11,800.00 just to add more memory to the server to support a solution that does not have memory overcommit. The cost of a 4 socket VMware VI3 Enterprise license is $11,500.00 list price. As you can see the cost of a VMware license is actually $300 less than the cost of adding more memory. Not much of an advantage on the cost side but it still drives home the point that the VMware solution is not more than the competitive "free" solution. What’s more is now you get all of the enhanced functionality of the VMware solution that the competitive solutions are lacking. But enough of the marketing pitch, let’s go back to the numbers and this customer case study.

We’ve shown you a real world scenario from a real customer where adding more memory to the server is still more expensive than the VMware license cost. I want to take this one step further and show what designing the architecture to more closely match the true working set size of memory could do. Remember that’s we’re using less than 20 GB of RAM to run this environment of 178 VMs. We went back to the IBM website and configured the starting point for a VMware solution to have at least 20 GB of RAM. The closest we could get is a host with 24 GB of RAM as seen below.

VDI Cheaper Starting Configuration.png

VDI Cheaper Starting Memory.png

As you can see, by truly engineering the solution to what you actually need we’ve reduced the hardware cost to $20,023.00. If we compare that to the competitive solution’s hardware cost we get a difference of $16,400.00 for the extra memory. Subtracting out the cost of the VMware license the VMware solution comes in at $4,900.00 cheaper than the competitive solution.

It’s All for Charity

I would love for us to give all of that $4,900.00 to charity but the customer wouldn’t go for that. Instead they’re using it to pay for some of the thin clients they’ll use in this solution. After all the $4,900.00 they saved by going with a VMware solution over a competitive "free" solution will buy them 33 thin clients at their current price.

The good news from all of this is James O’Neill from Microsoft was kind enough to volunteer $270 of his own money if we could show a real customer running with memory overcommit where adding memory to the server wouldn’t make the Microsoft solution still cheaper than VMware. I think we’ve adequately met that goal here. So, James, the charity of choice is One Laptop Per Child. And just in case you believe that we’ve cherry picked a use case we’ll be more than happy to connect you directly via phone to any one of the numerous customers we have leveraging memory overcommitment in their environment today.

15 thoughts on “Memory Overcommitment in the Real World

  1. VMTN Blog

    Memory Overcommitment in the Real World

    We really think VMware Virtual Infrastructure gives a huge amount of value and features compared to other virtualization solutions on the market. Our customers tell us the ROI is high, the time to recoup their costs is small, and a

  2. James O'Neill

    Mike, if you look at the follow-up post I did to the one you linked to so you see the ground we agree on – IF you can get a high enough over-commit ratio AND you start with enough RAM in the system, then using software in place of RAM can be cost effective, and (tip of the hat to you) you’ve found a case where you’re doing exactly that.
    If/when this is *In Production* – not “Getting ready” as it says in the first paragraph I’ll pay up. (Get the customer to drop me a mail). If it’s not (yet) I’ll hold off. I’ll post to my blog when the prize has been claimed.

  3. vmguru

    Seriously? I cannot believe VMware allowed this bogus information to get posted.
    1. VMware has always stated a 128 virtual CPU limit per host. This makes the example above useless since that configuration would not be supported.
    2. With the 128 vCPU MAX limit the memory required drops back down to 64GB… which is the first configuration in the example. 64GB of RAM (using 2GB DIMMS) is about $7300 while 32GB of RAM is half that… still cheaper to have more RAM then to pay for the hypervisor.
    3. How come the screen shots are at idle?? Why not during real production load, show the actual utilization of memory and CPU across both hosts 🙂
    Cut attempt… but ultimately flawed

  4. Mike DIPetrillo

    vmguru,
    128 vCPU is the stated limit currently. We’re working with this customer (and several others) to certify a higher vCPU limit. Similar to the other virtualization vendors out there 128 vCPUs is what we officially support now but it’s not the limit of the architecture. We’re using this example since it’s a real world deployment in-line with several other VDI implementations that are being installed around the world.
    As for the CPU utilization – that’s just what are real VDI workload looks like. I’m still confused as to why everyone keeps thinking that a desktop workload is going to run at 100% utilization all the time. Has anyone really looked at the average utilization of their own system for an entire day? Industry average for a desktop system is 2% utilization. From the performance graph we have here you can see almost 100% utilization (across 16 cores) when the VMs are starting. During the actual user load on the systems (these are call center agents) average CPU was around 25 – 30% utilization. This is completely in-line with every other VDI implementation I have been involved in (I’ve done over 100 and counting).
    So to recap, this is real from a real customer. I could go and get stats from a few hundred other customers like this and post them but that’s just a waste of browser space. If you’re not going to believe one customer case study I doubt I’ll be able to convince you with 100 of them.
    In the mean time we have started a nice thread in the community forums for people to write in with their stories. One is already there and more are coming. http://communities.vmware.com/message/887472 .

  5. Sheng Sheen

    I was just at a customer site today, and they are running with memory overcommit as well. Yes, they are in production and have been running for over 2 years with it without any problems.

  6. rvargas

    I just put together a VI3 infrastructure here at my job with 16GB RAM on one ESX host, but I have overcommitted 22GB of RAM to the VMs. My VMs range from Fedora Core 6, to Window Server 2003 R2 Enterprise (some running Citrix PS4.5), to Windows XP SP2.

  7. Blinkiz

    The numbers are right but slanted.
    The low memory usage is not only by the feature “memory overcommit”. I think the majority of the reason for the low memory usage is by the feature “memory balloning”. Also add to this RAM that is not even allocated in a VM.
    Other hypervisors also have balloning or something alike.
    Please someone. Show me figures that does not include the feature memory balloning, not allocated ram and swapped ram to disk. Just pure what the feature “Memory Overcommit” can do.

  8. Mike DiPetrillo

    Memory overcommit is only the act of oversubscribing memory. There are several features that are in play to allow for memory overcommitment. Transparent page sharing, memory ballooning, and paging are all components that allow for memory overcommit. First of all not all hypervisors have memory ballooning. The Xen Open Source hypervisor is the only one with memory ballooning today and it’s not even picked up by any of the commercial distributions that use Xen (Citrix, Red Hat, Novell, etc). That aside, memory ballooning only kicks in when there’s pressure on the memory resources which happens *after* transparent page sharing (a VMware exclusive) has freed up a lot of memory. I should also mention that just because a guest isn’t using memory doesn’t mean it hasn’t been assigned. Only transparent page sharing will allow for free and like memory pages to be consolidated and truly allow for high performing VMs while oversubscribing host memory.
    So really there is no “memory overcommit” piece of technology – it’s just a feature that’s made possible by transparent page sharing, memory ballooning, and paging. There’s a really good academic paper from the patent holder for these features here: http://www.vmware.com/resources/techresources/531.

  9. Blinkiz

    KVM has transparent page sharing, memory ballooning, swapping, paging, processor overcommit, paravirtualization drivers for block and network devices, live migration, support for tons of third party drivers/devices, dynamic load balancing, distributed file system and alot more.
    BUT! KVM does not have any good centralized management GUI for the enterprise. As far as I know, it does not even exist one.
    Previous post I talked about “memory overcommit” like a standalone feature. My bad. English is not my main language.

  10. Mike DiPetrillo

    Blinkiz,
    Currently VMware is the only one to implement the features you describe. KVM does have a lot of this on their roadmap which can be found here: http://kvm.qumranet.com/kvmwiki/Roadmap. Most other hypervisors (Microsoft and Xen) also have some of these features on their roadmap for 2 – 3 years from now. All of this further validates the value of these features.
    It will be interesting to see how the other hypervisors implement the technologies since VMware currently holds 4 patents around memory technologies which don’t start to expire until 2017.
    P.S. Don’t worry about the English as a second language. I wish I spoke more languages. Indeed a lot of these posts to talk about memory overcommit as a feature like it alone is a piece of technology. I guess you could call it a feature but it’s made possible by other underlying technologies.

  11. Blinkiz

    Hi their Mike
    The roadmap is outdated. All features under the memory section is already implemented.
    Still, KVM is a new product/solution that has not been tested as much as vmware vi3. Currently I cannot recommend any company to run kvm. But watch out in the future for KVM! The solution to have the linux kernel as virtualization engine is very effective. KVM = Kernel-based Virtual Machine

  12. Cam Merrett

    Great output. We just completed a white paper on the x3850 M2 to test consolidation ratios. We hit the 128 limit (in VI3 v3.5) VMware stuck in the limit) and the box was running at 41% utilisation. Then VMware allowed us to release the limit and we took it up to 227 with all VM’s running user scripts to simulate real world usage. This blog shows it works in the real world.
    One thing we have seen as well with VDI is because the performance of this machine is so suited to VDI, applications can process requests so fast they actually require less resources. In one test we used an application that was Excel front end and Oracle backend. This application would take a normal 2 processor workstation with 4GB of RAM up to 100% utilisation of RAM and CPU. When we virtualised it the VM was set to 2 VCPU and 1GB of RAM and the VM sustained only 52% utilisation. It is not just memory oversubscribe it is the underlying Hurricane processor the M2 has that gives us these consolidation ratios. Our final calculation was power consumption. Stick with 120 VM’s per host a Wyse terminal and 17″ LCD screen and we get 56.01 watts per user. Thanks for vaildating our white paper. Be careful about upgrading your version of ESX as we did hit the 128 limit and could not power on the 129 with the latest version.

  13. Ruediger Gerolf Biernat

    Blinkiz:
    This is an old thread but take a look at http://pve.proxmox.com – you will be surprised how good a GUI can look for KVM. 🙂
    And it even supports OpenVZ for small servers with one purpose only.
    Greetings,
    Ruediger G. Biernat

  14. Wyatt

    Funny that this is called “Virtual Reality” and setting the record straight. It is anything but that. I was the person that took these screen shots!

Comments are closed.