Home > Blogs > Virtual Reality > Author Archives: Eric Horschman

Author Archives: Eric Horschman

2010 – Another Great Year for VMware and Our Customers

The film industry awards season is just reaching its peak, but the software industry’s own Oscars for 2010 have been handed out and VMware finished up with another great year.  Here’s a sampling of the recognition VMware received from the IT trade press in 2010:

Those same members of the technical press had some very nice things to say about VMware and our products in 2010:

  • “Microsoft won’t come anywhere close to leapfrogging VMware in 2010” – Redmondmag.com
  • “VMware is in fact leaps and bounds ahead of competitors” – InfoWorld
  • “…the ESX feature set is substantially more mature…VMware remains the champ.” – InformationWeek

and my favorite…

  • “VMware is the king of virtualization. There is none higher.” – TechTarget

The analysts also directed much praise our way in 2010.  Gartner came out with their eagerly anticipated Magic Quadrant for x86 server virtualization and one look at the upper right corner of the chart would warm the heart of any IT manager who selected vSphere to be the foundation of his company’s virtualization and cloud efforts.  In December, I was in attendance at Gartner’s annual Data Center Conference where audience polls showed 87% of attendees were using VMware as their primary x86 virtual machine solution in 2010 – only 2% were using the nearest competitor.  Even better, when asked which virtualization provider they would be using in 2015, the Gartner attendees picked VMware by a 5:1 margin over the nearest alternative.

One thing our customers can find especially satisfying from this recognition is it points out how they made the right choice with VMware.  Whenever a customer invests in enterprise software, they want assurance that they will see steady improvements – big and small – in their chosen product.  In the case of VMware vSphere, our users have really won the lottery.  For almost ten years, each update and major release of our vSphere product family has rewarded our customers with improvements in performance and scalability as well as a steady stream of new capabilities – things like Storage vMotion, Fault Tolerance and Distributed Switches.  I imagine our early adopters who selected ESX or VMware Infrastructure are feeling pretty good about how their decision was rewarded.

But 2010 is behind us and we’re now focused on investments in upcoming waves of VMware products.  Our current customers and customers-to-be can feel assured that we’re working to keep up our pace of innovation (and recognition!) on into 2011 and beyond.

Considering XenServer for XenApp? It might be time for a “Virtual Reality Check”.

If you haven't already done so, now might be a good time to surf over to the Project Virtual Reality Check site (registration required).  No, it's not a web page dedicated to proofreading this blog.  It's run by a team of virtualization consultants that have been publishing comparisons of VDI and Terminal Services performance in various hypervisors.  They made a big splash back in January with their first set of reports comparing ESX 3.5, XenServer 5.0 and Hyper-V R1.  Our exclusive ability to overcommit memory made ESX the clear winner in the VDI tests with support for more than twice as many VMs as the others.  On the Terminal Services tests, it was XenServer that came out ahead in the number of user sessions it could handle.  It wasn't good news for VMware, but the story doesn't end there.

Nothing gets us more worked up than finishing second in a product comparison, so our performance team started digging into the Project VRC benchmark to see what might be going on.  Right off the bat, we saw that ESX wasn't configured to use the hardware assist for memory management (AMD RVI) that was present on Project VRC's servers.  ESX 3.5 was the first hypervisor to support hardware assist for memory management and we certainly wanted ESX to look its best.  The Project VRC guys updated their ESX test report in March and the findings showed that ESX narrowed the XenServer lead in Terminal Services user sessions by half once RVI was enabled.  It was an improvement, but we still weren't ready to accept second place.

The Project VRC results prompted us to do some rigorous testing of ESX performance with XenApp — a workload based closely on Terminal Services — and our findings convinced us that we could support more user sessions, so what was going on with the Project VRC tests?

It turns out that the Project VRC team was also taking a second look at their benchmark.  Our performance team (and also the experts at Citrix) collaborated with Project VRC's test architects and all groups came to the conclusion that timing issues were skewing the results.  Use of in-guest timers, one of the classic demons in virtualization benchmarking, was identified as a problem area.  You can see our take on it here.

The Project VRC team dug in and developed a completely new Terminal Services benchmark that reduced use of in-guest timing, when compared to the first version.  The new workload created by Project VRC still depends on in-guest timing, but is a big improvement over the first version and we're grateful for Project VRC's efforts and flexibility.  VMware was also hard at work since the first Project VRC tests improving performance in vSphere.  Our own testing demonstrated a 30% improvement in XenApp throughput between ESX 3.5 and 4.0, so we expected a strong showing in new test runs.

When the first Project VRC comparisons came out, we saw Citrix promote them heavily, especially with customers that were deciding on a virtualization platform for their XenApp servers.  XenApp and Presentation Server customers have been rapidly moving those servers into VMware virtual machines and it's understandable that Citrix wanted some independent test results that might slow down what they saw as defections and keep those XenApp servers on their hypervisor platform.

If you're a XenApp user making a virtualization platform choice you need to read the new Project VRC paper that came out last week NOW.  The paper is titled, "VRC, VSI and Clocks Reviewed".  The results have shifted dramatically since the January findings.  Instead of trailing, ESX 4.0 now has a 4% advantage over XenServer 5.5 in the number of Terminal Services users it can support on a server. 

We attribute the turnaround for vSphere in the latest tests to the Terminal Services/XenApp performance enhancements we made in ESX 4.0 and improvements made by Project VRC to their benchmark.  These new results will come as big relief to XenApp users that felt pressured to virtualize on XenServer after seeing the old Project VRC numbers.  When presented with test findings that seemed to show XenServer could support more users per host, it was harder to choose the proven reliability and richer feature set provided by VMware vSphere.  We certainly hope no customers were lured into making the wrong choice based on test results now known to be flawed and outdated, but those of you yet to make that decision will be happy to know that the latest Project VRC comparisons make the selection of vSphere a no-brainer.

Our position on hypervisor footprints, patching, vulnerabilities and whatever else Microsoft wants to throw into a blog post

I just came back from a few days vacation to find multiple emails in my inbox from VMware customers and partners looking for a response to a series of bizarrely rambling posts (here, here, here and here) on Microsoft’s virtualization team blog.  Normally, we’d avoid a tit for tat exchange, but the Microsoft postings contained some confusing and erroneous depictions of VMware technology that I hope to address and correct here.

Hypervisor Disk Footprints

We’ve consistently taken the position that a smaller hypervisor is inherently better and we’ve found that most people agree with us, including Microsoft’s Technical Fellow, Mark Russinovich (see his presentation from Burton Group’s Catalyst Conference in July.)  The reasoning is that every line of code unavoidably adds reliability and security risks.  Microsoft has cited those same benefits of "smaller attack surface" from code size reduction as the motivation for their slimmed down Server Core and Hyper-V Server alternatives.  We don’t know how many lines of code are in a Hyper-V system, so we use the installed disk footprint — the size of the installed files needed to support virtual machines — as a reasonable proxy for lines of code.  In calculating hypervisor disk footprints we need to follow a few rules to ensure consistency:

  • The installation must be sufficient to run VMs and support all advertised features.
  • Any operating systems in management partitions, Dom0s or service consoles needed by the hypervisor should be included.
  • Management of the hypervisor can be from a remote client, so local management clients can be excluded.
  • Pagefiles, swap partitions, scratch/temp partitions and core dump partitions can be excluded.

In the case of ESXi, here’s the sequence we followed to calculate its disk footprint:

  • Start by installing ESXi 4 on a bare server.
  • Use the vSphere Tech Support Mode to display the contents of the ESXi boot images in the /bootbank directory.esxi_ls_output
  • A df -h command will then show you that the total size of those compressed ESXi boot images in the directory corresponding to /bootbank is 59.3MB — somewhat less than the 70MB figure we’ve publicly stated.  The other partitions in the listing are either loaded only in memory (/), or they are excluded per the rules above.  Note that this is not just a stripped down ESXi installation, it is a fully capable ESXi host supporting all licensed vSphere features.esxi_df-h_output

For comparison, here’s a look at the disk footprint of ESX 4.0 "Classic", which measures about 1.7GB.  Most of the additional footprint is due to the Linux-based service console.ESX4_Classic_footprint

The disk footprints we measured for Hyper-V R2 RTM are far larger.  Windows 2008 R2 Server Core with the Hyper-V role enabled, was 3.6GB.  For those Hyper-V users that want to preserve the "Windows they know," a full Windows Server 2008 R2 installation is pushing 10GB.  (The Hyper-V Server R2 RTM is not available to us yet, but we expect that its footprint will fall between that of Server Core R2 and full Windows Server 2008 R2, as it did for R1.)  For the graphically inclined, here’s a comparison that shows just how much less "surface area" ESXi presents to bugs and attacks.

Hypervisor_footprints

Yes, ESX "Classic" does use a Linux-based service console and therefore has a larger disk footprint, but VMware has publicly stated that the OS-free ESXi architecture is our future direction and ESXi has all the capabilities of ESX "Classic".  Microsoft has made no such commitments to eliminate Hyper-V’s dependency on Windows.  In fact, Microsoft CEO Steven Ballmer has stated that, "Our view is that virtualization is something that should be built into the operating system."  Not very encouraging for those hoping to see Hyper-V decoupled from the Windows monolith.

Microsoft’s explanation in their blog that, "our entire footprint which is made up mostly of stuff that isn’t exposed to VM traffic at all or only exposed indirectly," isn’t something I’d want to boast about and it’s exactly the thinking we wanted to get away from with the ESXi architecture.  We made ESXi exclusively dedicated to VM support and it doesn’t bring along the baggage of a general purpose OS.  Why would you want your hypervisor to be dependent on the proper functioning and security of tens of millions of lines of code that have nothing to do with supporting your VMs?

Hypervisor Patching

Microsoft’s blog then moves on to argue that patching is somehow less of a burden with Hyper-V because the aggregate size of its patches are less than for ESXi.  I’ll give them credit for creativity in coming up with that argument, but it’s really meaningless.  Because ESXi is installed and patched like an appliance — the entire image is replaced as a whole — our patches are naturally the size of the full ESXi installer package.  Our customers prefer that appliance approach because it ensures consistency in the their installations and avoids "patch drift" away from a validated configuration.  With the Windows Update-based patching used for Hyper-V, patches can be smaller, but customers can skip or miss patches, resulting in insecure, partially patched configurations.

What really matters in patching is the number and disruptiveness of the patches.  With ESXi, we’ve dramatically cut down on the number of patches customers need to download and apply.  The biggest reason for the reduction is the elimination of the Linux-based service console.  The more frequent rate of ESX "Classic" patches is mainly due to our approach of playing it safe for cus
tomers by distributing patches for any issues affecting the Linux-based service console, even though most of those patches aren’t needed by customers because the Linux services addressed by the patches are normally disabled in an ESX installation.  Also, we do patch ESX "Classic" incrementally using the "surgical fix" approach with smaller patches that Microsoft seems to advocate.

With both ESX and ESXi, a host reboot following patching has always been non-issue because VMotion and Maintenance Mode make it trivial to shift VMs to alternate hosts during the reboots.  Microsoft’s customers must certainly be looking forward to using those same features in the long-awaited release of Hyper-V R2.

However, what must really be frustrating to Hyper-V users is the need to constantly patch and reboot Hyper-V hosts with miscellaneous Windows Server 2008 patches that have nothing to do with virtualization.  Even if you use the stripped-down "Server Core" version of Windows Server 2008 that Microsoft recommends for production Hyper-V system, you’re almost guaranteed to need a host reboot every "Patch Tuesday."  We’ve kept track of the "Patch Tuesday" patches required on a Server Core Hyper-V system since Hyper-V first shipped in June 2008 and there have been multiple "Important" or "Critical" patches to apply almost every month.  Most of those patches don’t apply to Hyper-V, but users must still install them and then reboot their hosts.  And, as users are painfully aware, Hyper-V R1’s missing live migration support has meant downtime for their VMs with each reboot.  The downtime may lessen with Hyper-V R2, but the patches won’t.

Patch Tuesday Jul 2008 Aug 2008 Sep 2008 Oct 2008 Nov 2008 Dec 2008 Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Jun 2009 Jul 2009 Aug 2009
"Important" Server Core Patches 2 5 4 4 5 4 1 0 2 5 0 3 0 5
Patches affecting Hyper-V 0 0 1 4 0 0 0 0 1 0 0 0 0 0
Server Core Reboot Required? Yes Yes Yes Yes Yes Yes Yes No Yes Yes No Yes No Yes

 

The Hyper-V patching situation really points out the need to keep the hypervisor free of dependencies on a general-purpose OS.  Microsoft tried to reduce the OS dependencies with the stripped down Server Core concept, but the numbers above clearly show they didn’t improve life for their customers.  For VMware customers, the truly thin ESXi architecture means no such extraneous patching and rebooting is needed.  If Microsoft has discarded their familiar GUI with Hyper-V Server and Windows Server Core, why do they persist in making Hyper-V dependent on Windows, especially after VMware ESXi has demonstrated that a hypervisor has no technical need for a general-purpose OS?  We can only surmise that Microsoft is trying to extend their Windows franchise with an edict to their business units that all servers must be built on top of Windows.  It’s just too bad their Hyper-V users have to suffer the inconveniences and risks of that OS dependency.

Bugs and Vulnerabilities

Microsoft’s blog then gleefully brings up last year’s ESX 3.5 Update 2 timebomb issue in an effort to find fault in our patching process.  The timebomb bug was a major goof in our release process, we were mortified and we rightly took our lumps, but it had nothing to do with our standard patching process.  Microsoft’s description of the bug as causing two days of downtime for our users is just plain wrong.  Powered on VMs kept right on running when the timebomb activated and we had a patch out in less than 24 hours that, when used together with VMotion and Maintenance Mode, meant few production users suffered any downtime.  Despite the facts, I’m sure Microsoft will remind us of that episode many more times to come.

The residents at Microsoft’s glass house had some other stones to toss our way.  Microsoft pointed to CVE-2009-1244 as an example of a guest breakout vulnerability in ESX and ESXi.  A guest breakout exploit is serious business, but, once again, Microsoft is misrepresenting the facts.  VMware responded quickly to patch that vulnerability in our products, and ESX was much less affected than Microsoft would lead you to believe:

  • The exploit was purely theoretical for generally available versions of ESX and ESXi.
  • No working exploit was ever provided for any version of ESX and ESXi.  The reporters only showed an example of the exploit on VMware Workstation.  They claimed a pre-release version of ESX 4.0 to be vulnerable, but no exploit was demonstrated.
  • The exploit reporters have since acknowledged that released versions of ESX 4.0 and ESXi 4.0 are not vulnerable.

The truth is, vulnerabilities and exploits will never c
ompletely go away for any enterprise software, but ESX has been remarkably resistant to such issues.  If it happens again, we’ll find the problem and fix it quickly, as we did for CVE-2009-1244.  I’ll also point out that a guest breakout is a much more serious issue when it drops you into a familiar general-purpose management OS like the Windows Server 2008 parent OS used by Hyper-V than it is with a design like ESXi where an escape grants access to just a thin, hardened hypervisor like our vmkernel.  A hypervisor that relies on an OS like Win2008 with a history of regular and recurring remote vulnerabilities will always make an easier target for attackers and should not inspire false confidence.

In response to Microsoft’s discussion of their own security practices, I should also point out that VMware has a Secure Software Development Lifecycle (SSDL) as well as a Product Security Policy (PSP) in place.  We invite our customers (and Microsoft bloggers) to learn all about it later this month at VMworld 2009 session TA2543 that we’ve dedicated to the topic.  There’s really no need for the petty sniping by Microsoft on this topic.  Both VMware and Microsoft have rigorous security development processes and both ESX and Hyper-V have achieved the demanding Common Criteria EAL 4 level of certification (actually level 4+ for ESX.)

Performance Benchmarking

The last potshot from Microsoft’s blog was directed at VMware’s policy of requiring review of performance benchmarks of VMware products prior to publication.  Microsoft claims we’re trying to control and distort the truth.  I’ve explained the rationale behind our benchmark policy previously in this blog.   VMware is in no way trying to restrict publication of valid performance data.  In fact, we have approved plenty of benchmarks that show other products leading in tests.  Our friends at Citrix have submitted multiple test results that we have approved.  We’d be happy to do the same for Microsoft, but they have yet to make a request.  I wonder why?

OK, that’s enough attention paid to the Microsoft blog.  I hope to see you all at VMworld shortly where we can all discuss where VMware is going — with SpringSource and our Cloud initiatives, it’s getting pretty exciting around here!

Microsoft Does the Impossible – Eliminates Entire Layer from Hyper-V Without Doing a New Release!

Did you catch the latest video from Microsoft’s virtualization team?  In this one, the Microsoft guys are making the argument that VMware somehow imposes a “virtualization tax” by inserting an additional layer in your datacenter architecture that Microsoft doesn’t need.  I’m familiar with Microsoft’s Hyper-V architecture and knew that as far as the count of layers, it’s no different than VMware.  So what has changed?  Did Microsoft achieve the impossible and remove a complete layer from their virtualization architecture without so much as a service pack?

Here’s how the VMware architecture looks:

VMware_arch

From bottom to top, I count four layers: 1) the x86/x64 hardware; 2) the hypervisor (VMware ESXi); 3) the guest OS in the VM; and 4) the application in the VM.  It’s nothing surprising and the same diagram we’ve used for years to explain how our products work.

Now, let’s take a look at the latest Microsoft architecture using a diagram from their video:

New_Hyper-V_arch

 

Wow, maybe they’re right!  I only count three layers.  How did they do it?  They got rid of an entire layer.  Is virtualization now part of the guest OS?  Maybe they figured out how to make their apps run directly on Hyper-V with no guest OS at all?  It’s especially amazing when all the Hyper-V product presentations I’ve ever seen and even Microsoft’s own virtualization white paper on their web site use a diagram like this:

Old_Hyper-V_arch

This picture clearly shows the same four layers as the VMware architecture with Hyper-V operating as a type 1 or “bare-metal” hypervisor running directly on the hardware.  In fact, compared to the OS-free ESXi architecture, Hyper-V even adds in that extra copy of Windows Server 2008 you see on the left side.  Should we count that as a fifth layer?

So, has Microsoft achieved a software miracle by fully eliminating one or two layers from the Hyper-V architecture?  Are VMware users really stuck paying a “tax” due to an excess layer in our design?  Or could it be that Microsoft has simply redrawn their pictures and changed their story on how Hyper-V really works?  I’ll leave it to sharper minds than mine to uncover the answer to this mystery.

As to the Microsoft claims of costing one-third as much as VMware repeated yet again in their video, we ask that you not be fooled.  Microsoft may give away Hyper-V with Windows Server 2008, but they are charging big bucks for System Center management and all the servers, databases and agents you need to compare with our combination of VMware ESX and vCenter.  It’s not easy to figure out all the Hyper-V and System Center pieces required to run a given number of VMs, but we’ve done the hard work for you in our Cost-per-Application Calculator.  Give it a try and you’ll see that, even without the VM density advantage you get with our exclusive memory overcommit capability, VMware costs about the same as Microsoft.  You’ll also see that running just a few extra VMs per host with ESX operating at a very conservative level of memory overcommit quickly yields a 20-30% cost advantage for VMware.

Anyway, nice try with the sequel guys – do you have plans to make it a trilogy?

A Big Step Backwards for Virtualization Benchmarking

Why There's Still a Benchmarking Clause in Our EULA

We have a regularly repeating discussion here at VMware regarding benchmarking that goes along these lines:

Executive A: It seems like most of the people writing about virtualization have figured out that performance benchmarks need some special treatment when used with hypervisors.  It appears that our performance and benchmarking best practices guidelines are making an impact.  They've been available for a while and we're not seeing as many articles with badly flawed tests as we used to.  You know, the tests with bizarre results that come from errors like trying to measure network performance when the CPUs are saturated, or timing benchmark runs using the VM's system clock, or measuring a VM's disk I/O when everything is cached in host memory.  Perhaps it's finally time to drop the clause in our EULA that requires VMware review of performance tests before publication.

Executive B: That would be great!  We respond to every request for a benchmark review and we work with the submitters to improve their test processes, but VMware still gets criticized by competitors who claim we use that clause in our EULA to unfairly block publication of any ESX benchmarks that might not be favorable to VMware.  Even vendors whose benchmarks have been approved by us complain that it's an unreasonable restriction.  If we drop the clause, then maybe everyone will stop complaining and, since it seems people now understand how to benchmark a virtualization solution, we won't see as many botched tests and misleading results.

Executive A: OK, then it's agreed — we'll drop the EULA benchmark clause in our next release.

And then something like this gets published causing us to lose faith once again with the benchmarking wisdom of certain members of the virtualization community and we're back to keeping the clause in our EULA.

Bad Benchmarking in Action

To summarize, the bad news was in a hypervisor benchmark published by Virtualization Review that showed ESX trailing the other guys in some tests and leading in others.  It was a benchmark unlike any we'd seen before and it left us scratching our heads because there were so few details and the results made no sense whatsoever. Of course, Microsoft didn't let the benchmark's flaws stop them from linking to the article claiming it as proof that Hyper-V performs better than other hypervisors.  As near as we can tell, the Virtualization Review test consisted of a bunch of VMs each running a PC burn-in test program along with a database VM running a SQL Server script.  To be fair to Virtualization Review, they had given us a heads up some time ago that they would be running a test and we gave them some initial cautions that weren't heeded, but we certainly never approved publication of the ESX test results.  If we had an opportunity to review the test plan and results, our performance experts would have some long discussions with the author on a range of issues.

Take for instance the results of the third test in the series, as published in the article:

Test 3 Component Hyper-V XenServer VMware ESX
CPU Operations (millions) 5000 3750 7080
RAM Operations (millions) 1080 1250 1250
Disk Operations (millions) 167 187 187
SQL Server (m:ss) 4:43 5:34 5:34

A cursory glance would suggest that one hypervisor demonstrated a performance win in this test. In fact, it is actually very difficult to draw any conclusions from these results.  We at VMware noticed that the ESX numbers reported for CPU Operations seemed to be 40% greater than for Hyper-V and 88% better than for XenServer.  Is ESX really that good, and XenServer and Hyper-V really that bad?  We'd like to take credit for a win, but not with this flawed test.

What’s happening here is that there are a wide variety of problems with this configuration – we found many of them during our inspection of the tests:

  • The fact that ESX is completing so many more CPU, memory, and disk operations than Hyper-V obviously means that cycles were being used on those components as opposed to SQL Server.  Which is the right place for the hypervisor to schedule resources?  It’s not possible to tell from the scarce details in the results.
  • All resource-intensive SQL Servers in virtual and native environments have large pages enabled.  ESX supports this behavior but no other hypervisor does.  This test didn’t use that key application and OS feature.
  • The effects of data placement with respect to partition alignment were not planned for.  VMware has documented the impact of this oversight to be very significant in some cases.
  • The disk tests are based on Passmark’s load generation, which uses a test file in the guest operating system.  But the placement of that file, and its alignment with respect to the disk system, was not controlled in this test.
  • The SQL Server workload was custom built and has not been investigated, characterized, or understood by anyone in the industry. As a result, its sensitivity to memory, CPU, network and storage changes is totally unknown, and not documented by the author.  There are plenty of industry standard benchmarks to use with hypervisors and the days of ad hoc benchmark tests have passed.  Virtual machines are fully capable of running the common benchmarks that users know and understand like TPC, SPECweb and SPECjbb.  An even better test is VMmark, a well-rounded test of hypervisor performance that has been adopted by all major server vendors as the standard measurement of virtualization platforms or the related SPECvirt benchmark under development by SPEC.
  • With ESX’s highest recorded storage throughput already measured at over 100,000 IOPS on hundreds of disks, this experiment’s use of an undocumented, but presumably very small, number of spindles would obviously result in a storage system bottleneck. Yet storage performance results vary by tremendous amounts. Clearly there's an inconsistency in the configuration.

We're Not Against Benchmarking – We’re Only Against Bad Benchmarking

Benchmarking is a difficult process fraught with error and complexity at every turn. It’s important for those attempting to analyze performance of systems to understand what they’re doing to avoid drawing the wrong conclusions or allowing their readers to do so. For those that would like help from VMware, we invite you to obtain engineering assistance from benchmark@vmware.com. And everyone can benefit from the recommendations in the Performance Best Practices and Benchmarking Guidelines paper.  Certainly the writers at Virtualization Review can.

Postscript: Chris Wolf of Burton Group commented on virtualization benchmarks in his blog. He points out the need for vendor independent virtualization benchmarks as promised by programs like SPECvirt.  I couldn't agree more.  VMware got the ball rolling with VMmark, which is a public industry standard, and we're fully supporting development of SPECvirt.

Hyper-V with Server Core — Too Dry and Crunchy for our Taste

We wouldn’t be doing our jobs at VMware if we didn’t regularly compare our products with the competition to ensure our customers get the best technology and user experience possible.  In keeping with that practice, we recently set up Microsoft’s Hyper-V to get a first hand look.  We made sure to follow Microsoft’s documentation and best practices guidance to get a fair comparison and to understand exactly what a Hyper-V user experiences as he or she attempts to deploy and configure Microsoft’s new product.

One key best practice we heard Microsoft’s Hyper-V team stress in sessions at June’s TechEd conference and again last week at VMworld was a strong recommendation to run Hyper-V using the Server Core variant of Windows Server 2008.   Using the smaller Server Core as the Hyper-V parent partition, instead of a full blown instance of Windows Server 2008, strips out Windows features and services not needed to run Hyper-V.  With Server Core, Microsoft is attempting to minimize the attack surface and patching requirements for Windows to make it a safer platform for virtual machines.  I would agree that as Hyper-V requires a general purpose operating system, you might as well make it as small as possible.  The Server Core concept seems like a good idea.  So, following Microsoft’s recommendations, we deployed Hyper-V with Server Core.

Server Core — “The Windows You Know”??

One aim in evaluating Hyper-V was to test its end-user experience, as Microsoft execs repeated over and over at their Sept. 8 virtualization event that Hyper-V would be eagerly adopted because it uses, “the Windows you know.”  The insinuation is that Hyper-V is easy — and of course that somehow VMware is not.  Microsoft is claiming that with Hyper-V there is no added learning required, no training, no classrooms, because you already know Windows, you can jump right into Hyper-V. They are also claiming that VMware ESX requires you to take the time to learn a whole new system.  But is this accurate?  Is the recommended Server Core flavor of Windows 2008 really, “the Windows you know”?  Is it easier than ESX?  We wanted to find out.

Servercore800x600_3 

Windows Server Core = MS-DOS 2008

If you haven’t seen Server Core yet, here’s the UI in its entirety.  It doesn’t look like the Windows I know, in fact it looks like DOS!  Are we stepping back in time?  Who knows DOS anymore?  Actually, it makes you wonder why Microsoft didn’t just call it MS-DOS 2008, especially since anyone using Server Core will need to resurrect some long lost command line skills to get any work done.

Is Hyper-V with Recommended Server Core, In Fact, Easy?

So, how does the Hyper-V and DOS — err, I mean Server Core — combination stack up when compared to the user experience of VMware ESXi?  To try it out, we did side-by-side installations of Microsoft Server Core/Hyper-V and VMware ESXi 3.5 on identical servers.  To let you see the details of each setup process, we recorded the entire sequence in a pair of videos.

Hypervisorinstallation2_2This first video shows every step required to install Hyper-V and ESXi on a fresh machine.  We kept count of the elapsed time, reboots, mouse clicks and keystrokes each product needed and it clearly shows the huge advantage the truly thin and OS-free ESXi architecture has in installation speed and simplicity.  ESXi goes from bare-metal to fully installed in one-third the time, half the mouse clicks, hundreds fewer keystrokes and just one reboot vs. seven compared to Hyper-V.  The simplicity of the ESXi wizard-driven installation is striking compared to the arduous process needed to first get the Server Core OS installed and then configure Hyper-V in a command line environment.

Iscsi1_4Our second video starts where the first left off and takes Hyper-V and ESXi through the steps needed to configure two iSCSI datastores for VM use.  iSCSI setup is a standard task for any virtualization user that wants to take advantage of shared storage for VM migration and high availability.  ESXi’s Windows-based Virtual Infrastructure client makes the iSCSI setup quick and easy.  For Hyper-V, the “Windows you know” is nowhere to be seen.  Instead, working with Server Core requires you to key in a long sequence of obscure commands to configure iSCSI initiators and targets, partitions and file systems.  We generously showed the Hyper-V setup executed with no delays, although it took us hours of digging through Microsoft documents and knowledgebase articles to find the right commands to use when configuring iSCSI in Server Core.

Our Conclusion: Server Core plus Hyper-V is for Experts Only

VMware has put great effort into making ESXi the easiest and fastest hypervisor to install and configure and these videos clearly show the results.  Getting the OS out of the hypervisor plays a big part in the streamlined simplicity of ESXi as there is no general purpose OS to configure and manage and the reliability and security issues accompanying the tens of millions of lines of code an OS brings along are eliminated.  Microsoft’s OS-centric Hyper-V architecture adds many steps to the setup and puts their users in a quandary: either A) they install Hyper-V on a full Windows Server 2008 OS and deal with the frequent patching and security fixes Windows requires; or, B) they follow Microsoft’s best practice guidelines and suffer with the limitations of Server Core.  As the videos show, the tradeoffs with Server Core are daunting — Windows administrators will find their familiar GUI tools are missing and they’ll be left to spend a lot of quality time with search engines tracking down documentation on Microsoft’s obscure command line utilities.

Take a look at the side-by-side comparison videos and let us know if you agree that ESXi provides a far faster and easier (or maybe we should say, “moister and chewier“) setup experience.  Better yet, try ESXi and Hyper-V with Server Core on your own machines and tell us how it went.

A Look at Some VMware Infrastructure Architectural Advantages

Our customers have been asking us for an explanation of the key differences between the VMware ESX hypervisor architecture and the Windows-based Hyper-V architecture they’ve been hearing about recently from Microsoft.  We put together this summary explaining the elements of the ESX architecture that we believe set it apart from Hyper-V and Xen and the reasons behind some of our design decisions.  We thought it would be interesting material for the readers of this blog, so take a look and tell us what you think…

VMware Infrastructure – Architecture Advantages

VMware Infrastructure is a full data center infrastructure virtualization suite that provides comprehensive virtualization, management, resource optimization, application availability and operational automation capabilities in a fully integrated offering. VMware Infrastructure virtualizes the entire IT infrastructure, including servers, storage and networks and aggregates these heterogeneous resources into a simple and uniform set of computing resources in the virtual environment. With VMware Infrastructure, IT organizations can manage resources as a shared utility and dynamically provision them to different business units and projects without worrying about the underlying hardware differences and limitations.

Complete Virtual Infrastructure

VMware_VI_stack_slide_23Jun2008

As shown in the preceding figure, VMware Infrastructure can be represented in three layers:

1. The base layer or virtualization platform is VMware ESX – the highest performing, production-proven hypervisor on the market. Tens of thousands of customers deploy VMware ESX (over 85 percent in production environments) for a wide variety of workloads.

2. VMware Infrastructure’s support for pooling x86 CPU, memory, network and storage resources is the key to its advanced data center platform features. VMware Infrastructure resource pools and clusters aggregate physical resources and present them uniformly to virtual machines for dynamic load balancing, high availability and mobility of virtual machines between different physical hardware with no disruption or downtime.

3. Above the virtual infrastructure layers sits end-to-end application and infrastructure management from VMware that automates specific IT processes, ensures disaster recovery, supports virtual desktops and manages the entire software lifecycle.

VMware ESXi – The Most Advanced Hypervisor

VMware ESXi 3.5 is the latest generation of the bare-metal x86 hypervisor that VMware pioneered and introduced over seven years ago. The industry’s thinnest hypervisor, ESXi is built on the same technology as VMware ESX, so it is powerful enough to run even the most resource-intensive applications; however, it is only 32 MB in size and runs independently of a general-purpose OS.

The following table shows just how much smaller the VMware EXSi installed footprint is compared to other hypervisors. These are results from installing each product and measuring disk space consumed, less memory swap files.

Comparative Hypervisor Sizes (including management OS)

VMware ESX 3.5 2GB
VMware ESXi 32MB
Microsoft Hyper-V with Windows Server 2008 10GB
Microsoft Hyper-V with Windows Server Core 2.6GB
Citrix XenServer v4 1.8GB

As the numbers show, ESXi has a far smaller footprint than competing hypervisors from vendors that like to label ESX as "monolithic."

The ESXi architecture contrasts sharply with the designs of Microsoft Hyper-V and Xen, which both rely on a general-purpose management OS – Windows Server 2008 for Hyper-V and Linux for Xen – that handles all management and I/O for the virtual machines.

Indirect_arch        Indirect_arch   

The VMware ESX direct driver architecture avoids reliance on a heavyweight Windows or Linux management partition OS.

Advantages of the ESX Direct Driver Architecture

Our competition negatively portrays VMware ESX Server as a “monolithic” hypervisor, but our experience and testing proves it to be the best design.

The architecture for Citrix XenServer and Microsoft Hyper-V puts standard device drivers in their management partitions. Those vendors claim this structure simplifies their designs compared to the VMware architecture, which locates device drivers in the hypervisor. However, because Xen and Hyper-V virtual machine operations rely on the management partition as well as the hypervisor, any crash or exploit of the management partition affects both the physical machine and all its virtual machines. VMware ESXi has done away with all reliance on a general-purpose management OS, making it far more resistant to typical OS security and reliability issues. Additionally, our seven years of experience with enterprise customers has demonstrated the impressive reliability of our architecture. Many VMware ESX customers have achieved uptimes of more than 1,000 days without reboots.

ESX_uptime

One of our customers sent us this screenshot showing four years of continuous ESX uptime.

The VMware direct driver model scales better than the indirect driver models in the Xen and Hyper-V hypervisors.

The VMware ESX direct driver model puts certified and hardened I/O drivers directly in the VMware ESX hypervisor. These drivers must pass rigorous testing and optimization steps performed jointly by VMware and the hardware vendors before they are certified for use with VMware ESX. With the drivers in the hypervisor, VMware ESX can provide them with the special treatment, in the form of CPU scheduling and memory resources, that they need to process I/O loads from multiple virtual machines. The Xen and Microsoft architectures rely on routing all virtual machine I/O to generic drivers installed in the Linux or Windows OS in the hypervisor’s management partition. These generic drivers can be overtaxed easily by the activity of multiple virtual machines – exactly the situation a true bare-metal hypervisor, such as ESXi, can avoid. Hyper-V and Xen both use generic drivers that are not optimized for multiple virtual machine workloads.

VMware investigated the indirect driver model, now used by Xen and Hyper-V, in early versions of VMware ESX and quickly found that the direct driver model provides much better scalability and performance as the number of virtual machines on a host increases.

Netperf_scaling

The scalability benefits of the VMware ESX direct driver model became clearly apparent when we tested the I/O throughput of multiple virtual machines compared to XenEnterprise, as shown in the preceding chart from a paper published here. Xen, which uses the indirect driver model, shows a severe I/O bottleneck with just three concurrent virtual machines, while VMware ESX continues to scale I/O throughput as virtual machines are added. Our customers that have compared VMware ESX with the competition regularly confirm this finding. Similar scaling issues are likely with Hyper-V, because it uses the same indirect driver model.

Better Memory Management with VMware ESX

In most virtualization scenarios, system memory is the limiting factor controlling the number of virtual machines that can be consolidated onto a single server. By more intelligently managing virtual machine memory use, VMware ESX can support more virtual machines on the same hardware than any other x86 hypervisor. Of all x86 bare-metal hypervisors, only VMware ESX supports memory overcommit, which allows the memory allocated to the virtual machines to exceed the physical memory installed on the host. VMware ESX supports memory overcommit with minimal performance impact by combining several exclusive technologies.

Memory Page Sharing

Content-based transparent memory page sharing conserves memory across virtual machines with similar guest OSs by seeking out memory pages that are identical across the multiple virtual machines and consolidating them so they are stored only once, and shared. Depending on the similarity of OSs and workloads running on a VMware ESX host, transparent page sharing can typically save anywhere from 5 to 30 percent of the server’s total memory by consolidating identical memory pages.

clip_image008

Transparent Page Sharing.

Memory Ballooning

VMware ESX enables virtual machines to manage their own memory swap prioritization by using memory ballooning to dynamically shift memory from idle virtual machines to active virtual machines. Memory ballooning artificially induces memory pressure within idle virtual machines as needed, forcing them to use their own paging areas and release memory for more active or higher-priority virtual machines.

clip_image010

Memory Ballooning.

VMware ESX handles memory ballooning by using a pre-configured swap file for temporary storage if the memory demands from virtual machines exceed the availability of physical RAM on the host server. Memory overcommitment enables great flexibility in sharing physical memory across many virtual machines, so that a subset can benefit from increased allocations of memory, when needed.

Memory Overcommit Provides Lowest Cost of Ownership

The result of this memory conservation technology in VMware ESX is that most customers can easily operate at a 2:1 memory overcommit ratio with negligible performance impact. Our customers commonly achieve much higher ratios. Compared to Xen and Microsoft Hyper-V, which do not permit memory overcommit, VMware Infrastructure customers can typically run twice as many virtual machines on a physical host, greatly reducing their cost of ownership.

Cost_per_VM_chart

TCO Benefits of VMware Infrastructure 3 and its better memory management.

The table above illustrates how a conservative 2:1 memory overcommit ratio results in a lower TCO for even our most feature-complete VMware Infrastructure 3 Enterprise edition, compared to less functional Microsoft and Xen offerings.

Storage Management Made Easy with VMFS

Virtual machines are completely encapsulated in virtual disk files that are either stored locally on the VMware ESX host or centrally managed using shared SAN, NAS or iSCSI storage. Shared storage allows virtual machines to be migrated easily across pools of hosts, and VMware Infrastructure 3 simplifies use and management of shared storage with the Virtual Machine File System (VMFS.) With VMFS, a resource pool of multiple VMware ESX servers can concurrently access the same files to boot and run virtual machines, effectively virtualizing the shared storage and greatly simplifying its management.

VMFS_diagram

VMware VMFS supports and virtualizes shared storage.

While conventional file systems allow only one server to have read-write access to the file system at a given time, VMware VMFS is a high-performance cluster file system that allows concurrent read-write access by multiple VMware ESX servers to the same virtual machine storage. VMFS provides the first commercial implementation of a distributed journaling file system for shared access and rapid recovery. VMFS provides on-disk locking to ensure that multiple servers do not power on a virtual machine at the same time. Should a server fail, the on-disk lock for each virtual machine is released so that virtual machines can be restarted on other physical servers.

The VMFS cluster file system enables innovative and unique virtualization-based distributed services. These services include live migration of running virtual machines from one physical server to another, automatic restart of failed virtual machines on a different physical server, and dynamic load balancing of virtual machines across different clustered host servers. As all virtual machines see their storage as local attached SCSI disks, no changes are necessary to virtual machine storage configurations when they are migrated. For cases when direct access to storage by VMs is needed, VMFS raw device mappings give VMware ESX virtual machines the flexibility to use physical storage locations (LUNs) on storage networks for compatibility with array-based services like mirroring and replication.

Products like Xen and Microsoft Hyper-V lack an integrated cluster file system. As a result, storage provisioning is much more complex. For example, to enable independent migration and failover of virtual machines with Microsoft Hyper-V, one storage LUN must be dedicated to each virtual machine. That quickly becomes a storage administration nightmare when new VMs are provisioned. VMware Infrastructure 3 and VMFS enable the storage of multiple virtual machines on a single LUN while preserving the ability to independently migrate or failover any VM.

VMFS gives VMware Infrastructure 3 a distributed systems orientation that distinguishes it from our competition.

VMware Infrastructure 3 is the first virtualization platform that supports pooling the resources of multiple servers to offer a new array of capabilities. The revolutionary DRS and HA services rely on VMFS features to aggregate shared storage, along with the processing and network capacity of multiple hosts, into a single pool or cluster upon which virtual machines are provisioned. VMFS allows multiple hosts to share access to the virtual disk files of a virtual machine for quick VMotion migration and rapid restart while managing distributed access to prevent possible corruption. With Hyper-V, Microsoft is just now rolling out a first-generation hypervisors with a single node orientation. It lacks distributed system features like true resource pooling, and it relies on conventional clustering for virtual machine mobility and failover.

VirtualCenter – Complete Virtual Infrastructure Management

A VirtualCenter Management Server can centrally manage hundreds of VMware ESX hosts and thousands of virtual machines, delivering operational automation, resource optimization and high availability to IT environments. VirtualCenter provides a single Windows management client for all tasks called the Virtual Infrastructure client. With VirtualCenter, administrators can provision, configure, start, stop, delete, relocate and remotely access virtual machines consoles. The VirtualCenter client is also available in a web browser implementation for access from any networked device. The browser version of the client makes providing a user with access to a virtual machine as easy as sending a bookmark URL.

VC_diagram

VMware VirtualCenter centrally manages the entire virtual data center.

VirtualCenter delivers the highest levels of simplicity, efficiency, security and reliability required to manage a virtualized IT environment of any size, with key features including:

  • Centralized management
  • Performance monitoring
  • Operational automation
  • Clustering and pooling of physical server resources
  • Rapid provisioning
  • Secure access control
  • Full SDK support for integrations

I’ll stop there for now.  All the management and automation and VDI services depicted in the top layer of the figure at the beginning of this post further set us apart from the competition.  Services like Update Manager, SRM, Lab Manager and VDM offer amazing capabilities, but we’ll save that discussion for some upcoming posts.

Reviving the Dormant Grand Architectures of IT with VMotion

Long-deferred vendor visions of agile data centers are finally coming true now that VMware virtualization with VMotion live migration has severed the ties that kept services fixed to x86 hardware.  Unfortunately, some vendors are trying to stage a revival with an inferior substitute for live migration.  Most notably, Microsoft is claiming that their "Quick Migration" feature is comparable to VMware VMotion and adequate for enterprise data centers, even though Quick Migration is not true live migration. We’ve even heard Microsoft tell audiences that our customers don’t trust VMotion enough for production use. Don’t fall for it — VMotion is ready, proven and in heavy use today by VMware customers who are bringing true flexibility and agility to their IT operations.

Do you remember the many grand visions for IT that were trotted out by the vendors and analysts during the dot com boom times? Adaptive Enterprise Computing, Next Generation Data Centers, Organic IT, On-demand Computing, Utility Computing and more were relentlessly pitched to CIOs with PowerPoint promises of continuously available services effortlessly floating on pools of servers and storage, finding the resources they needed all by themselves and magically recovering from any faults and disasters that should arise.  CIOs put up with the daydreaming until the vendors were finally shamed into backing off on the hard sell by their noticeable inability to deliver on the promises. The technology, especially in the x86 world, just could not break the bonds that kept applications and services firmly welded to their physical hosts.

The phenomenal growth of virtualization is now reviving some of those grand IT visions. With a virtualization layer that includes live migration, x86 workloads can float free of the fixed servers and storage hardware that enterprises have in place. And, thanks to tools like VMware VMotion that live migrates servers between hosts and VMware Storage VMotion that allows transparent relocation of a VM’s storage, those workloads finally can accomplish that floating without the slightest interruption to users and services. It’s not just VMware that is enabling this revolution-in-waiting; the Xen vendors are also starting to roll out their own live migration support.

It should not be surprising then, that Microsoft is using its entry into the virtualization market to bring its own grand architecture – the dormant “Dynamic Systems Initiative” – out of hibernation. Now apparently renamed as “Dynamic IT,” their vision was featured in Bob Muglia’s January 21 V-day missive to hundreds of thousand of Microsoft customers and partners.  In laying out the benefits of virtualization and live migration, we couldn’t have said it better ourselves:

"In the data center, virtualization not only supports server consolidation, but it enables workloads to be added and moved automatically to precisely match real-time computing needs as demand changes. This provides greater agility, better business continuity, and more efficient use of resources."

That “moved automatically” part sounds pretty compelling. Of course, you’d only want workloads to get up and move themselves if they could do so without the inconvenience of planned maintenance windows and application downtime. That’s exactly what VMware users have been doing with VMotion since we introduced it in 2003. VMotion delivers true live migration – users and services see no interruptions when a virtual machine is moved from one host to another. VMotion has proven so liberating and reliable that 59% of VMware customers use it regularly in production; some have accumulated hundreds of thousands of perfectly transparent migrations as VMs are automatically load balanced across host clusters with DRS. You don’t need to build a large-scale virtual infrastructure to benefit from VMotion. We see over and over how customers that adopt VMware Infrastructure for basic server consolidation projects quickly come to rely on the agility and freedom of VMotion as an essential element of their IT operations.  Here’s what Qualcomm had to say about the flexibility provided by VMotion:

"We’ve utilized VMotion extremely heavily. It offers so many benefits: being able to deal with downtimes, being able to do maintenance on the hardware supporting ESX Server hosts, and being able to balance resources. VMotion is a must-have capability for anyone seriously thinking of deploying virtual infrastructure."

While we’re gratified to see virtualization taking the lead in reviving Microsoft’s DSI story, its own virtualization tools are missing the crucial live migration support needed to pull it off. It’s important to know that Microsoft dropped plans for live migration in Hyper-V and is relying on a “not quite live” migration method it calls, “Quick Migration.” Microsoft Quick Migration works very differently than the iterative live memory transfer method used by VMware VMotion. Quick Migration fully suspends a VM, copies its memory image to disk, and then reloads and resumes the VM on a new host. That suspend/resume migration technique is far from live. In fact, Microsoft has documented (slide 47) that, even in ideal conditions, Quick Migration interrupts VMs between eight seconds and two minutes when using Gigabit speed networked storage, depending on VM memory size.

MSFT_Quick_Migration_slide_794x595

Unfortunately, that kind of downtime is more than most networked applications can tolerate. Just a few seconds of unresponsiveness will trigger TCP timeouts and application errors. We tried Quick Migration with the Hyper-V beta using Gigabit iSCSI storage connections and the results weren’t pretty, as you can see in this screen capture video:

(Clicking the screen icon Picture_1 switches to full screen mode, which will make the window text legible. If that doesn’t work, you can go directly to this movie at blip.tv)

The Quick Migration downtime caused file copies to fail, VM console connections were severed, and database clients had to be restarted. Scheduling planned maintenance downtime and telling users their apps will be down does not fit anyone’s definition of “Dynamic IT.” In contrast, migrating the same VM with VMotion on a VMware Infrastructure platform didn’t cause even a blip in the network sessions as this video shows:


(Clicking the screen icon Picture_1_2 switches to full screen mode, which will make the window text legible. If that doesn’t work, you can go directly to this movie at blip.tv)

Lab_arch_hyperv_5

In anticipation of any concerns that we stacked the deck in this demo to cause Quick Migration failures, we were careful to configure our Hyper-V setup exactly as documented in this Microsoft TechNet article (it was the only documentation on Quick Migration configuration we could find.)  We also used the latest Windows Server 2008 RTM and Hyper-V RC0 releases.  We’ve been repeating this Quick Migration test all the way back to the Viridian CTP release and with Virtual Server 2005 R2 before that, and we’ve always seen the same network session failures every time. Here’s a diagram of our Win2008/Hyper-V setup shown on the right.

I encourage any readers who’ve tried their own Quick Migration tests to share their experiences.

If you’re wondering why Quick Migration of a VM exhibits these network failures, but a normal Microsoft Cluster Service failover keeps network sessions alive, take a look at Mike DiPetrillo’s excellent explanation.  It’s due to the fact that during the time it takes to suspend then resume a quick migrating VM, there’s no network stack available to respond to its IP address.  Mike also explains how VMotion preserves network connections and shows how VMotion has become an indispensable money-saver for VMware customers.

While Microsoft may not be ready to deliver the true live migration needed for Dynamic IT in Hyper-V, their customers don’t need to defer their dreams of automated workload migrations and resource balancing. Microsoft operating systems and applications run great in VMware virtual machines and users can take full advantage of powerful virtualization services like VMotion, DRS, HA, Consolidated Backup, Storage VMotion and Update Manager. Live migration is finally letting Dynamic IT and all the other grand architectures of IT live up to their promises of data center agility and hardware independence.  Customers just need to ensure they choose a virtualization technology that supports true live migration like VMware VMotion and not get trapped with an inadequate substitute.

More on VMware Memory Overcommit, for Those Who Don’t Trust the Numbers

James O’Neill of Microsoft had a pretty sharp reaction to my last post. He accused me of cooking the numbers to exaggerate the cost saving benefits of VMware’s memory overcommit feature. To help James understand just how memory overcommit works, I’ve taken the numbers out of the argument and have used simple algebra to compare the TCO per VM for a host with either VI3 or Microsoft hypervisors. Let’s do the math…

Variables

CH Cost of server hardware
CM Cost of memory per GB
CVMW Cost of VMware virtualization software
CMS Cost of Microsoft virtualization software
COS Cost of operating system software
MH Physical server memory, GB
MV Memory per VM, GB
r Memory overcommit ratio

VMware

Microsoft

Total system cost

Total system cost

clip_image002

clip_image004

Number of VMs with overcommit

Number of VMs without overcommit

clip_image006

clip_image008

VMware Cost per VM

Microsoft Cost per VM

Equation5_2 

clip_image012

Let’s simplify a little…

clip_image014

clip_image016

Now let’s make the assumption that the Microsoft virtualization software is free (CMS=0), and let’s go to extremes and tilt the numbers in Microsoft’s favor by assuming that the hardware and memory is also free (CH=0, CM=0).

clip_image018

clip_image020

Now let’s plug in some real numbers for a 2-socket server. We’ll use a conservative VMware memory overcommit ratio of 2, the list price of VMware Infrastructure Enterprise ($5750) and the list price for Windows Server Data Center Edition ($5998).

clip_image022

clip_image024

Finally, we get total cost per VM for both products

$5,874

$5,998

So, you can see in this extreme case where hardware is free, the VMware Infrastructure system still beats Microsoft Hyper-V, or any other hypervisor, in total cost per VM in a Windows environment. Plugging in any realistic costs for hardware and memory just tilts the balance further in VMware’s favor. I’m also using our most feature-rich VI3 Enterprise product in the example – had I used one of our products like ESX Server 3i or VI3 Foundation that is a little closer to the barebones capabilities of Hyper-V, the numbers would look even worse for Microsoft.

James also criticized my failure to include software support costs in my example. First, this analysis considers up front acquisition costs and not factors like support, maintenance and depreciation, which are spread over the lifetime of the products. Second, I also excluded Microsoft’s Windows Server support costs, which are comparable to those of VI3.

James also seems to think our customers aren’t using memory overcommit in the real world. The fact is most VI3 users are running with some level of overcommit and a few have kindly posted responses to that effect to my previous post. James may also want to have a chat with his VP at Microsoft. Here’s what Bob Muglia had to say about shared memory in a recent interview:

[Bink] We talked about Vmware ESX and its features like shared memory between VMs, [Muglia] "we definitely need to put that in our product"

James may not be a believer in the savings from memory overcommitment made possible by VMware’s exclusive memory sharing technologies, but apparently others at Microsoft are.

Cheap Hypervisors: A Fine Idea — If You Can Afford Them


Virtualization customers should focus on cost per VM more than upfront license costs when choosing a hypervisor.  VMware Infrastructure’s exclusive ability to overcommit memory gives it an advantage in cost per VM the others can’t match.

Our competition and a few industry observers have lately taken up the sport of bashing VMware Infrastructure as overpriced. Microsoft is playing up their plans to bundle Hyper-V with Windows Server 2008 as a way undercut VI3 pricing and jump-start their late entry in the virtualization market. One of our Xen-based competitors has even adopted a marketing tag line of “one-fifth the cost of comparable alternatives,” clearly referring to us.

VMware Infrastructure customers and prospective users should not be misled by those accusations of inflated prices. Our rivals are simply trying to compensate for limitations in their products with realistic pricing. In defense of our pricing, I could go into details about the powerful virtual infrastructure features you get with VMware Infrastructure 3 Enterprise that the competition is still far from matching. I could also describe the great bargains we offer with our VMware Infrastructure Acceleration Kits. I could explain that our competition is prone to apples-to-oranges comparisons and their offerings should really be weighed against our small business VMware Infrastructure Foundation bundle or the VMware Infrastructure Standard bundle that adds high availability. I could steer those of you looking for the absolute lowest-cost enterprise bare-metal hypervisor to VMware ESX Server 3i for $495 – the thinnest technology available. VMware also has a great TCO/ROI calculator to help you decide, but if all that seems like too much work, let me propose a simpler metric for comparing hypervisors – cost per virtual machine.

Cost per VM is not that hard to measure: just add up the costs for the server, the virtualization software, the operating systems and the application software; then start adding VMs running your workloads until they can no longer meet your required service levels. We’ve actually done that work for you and you might find the results surprising.

We took a common dual socket server with 4GB of RAM and tried the test with ESX Server 3, Citrix XenServer v4 and Microsoft Hyper-V beta. We created and powered on 512MB Windows XP VMs running a light workload and kept adding them until the server couldn’t take any more. Our Hyper-V and XenServer tests topped out at six and seven VMs respectively, which was expected. You see, both those products subtract the full amount of memory allocated to each running VM from the host’s physical RAM. When you factor in the additional memory required by the hypervisor and the management OS, there’s room left for at most seven VMs. In fact, XenServer and Hyper-V will flat out refuse to let you power on an additional VM with a warning that memory resources have been exhausted, as shown in the screen shots below. XenServer and Hyper-V can’t do what we call “overcommiting” memory and that should strike you as tremendously wasteful when most data center VMs are lightly utilized.

XS_error_startingVM_caption

Citrix XenServer v4 does not support memory overcommit, so a 4GB server is only able to support seven 512MB VMs.

MS_win2k8_overcommit_error_final_caption1

Microsoft Hyper-V beta is also missing memory overcommit support and only handles six running VMs.

So how did ESX Server fare in the same test? Before I get to the results, I should explain two very important memory management features built into ESX Server. The first is called Transparent Page Sharing and I’ve always considered it one of the most clever features we have. Transparent Page Sharing takes advantage of the fact that VMs will tend to redundantly load the same contents into memory pages if they are running similar operating systems. If you’re running 10 Windows Server 2003 VMs, you’d expect identical chunks of the Windows OS to be in memory. Transparent Page Sharing finds those matching chunks across all the VMs and keeps just a single copy of each. If one VM makes changes to a shared page, ESX Server stores and tracks those differences separately. It’s not quite as trivial as I make it sound; there’s a lot of careful optimization built-in to do the scans for similar pages at times when the VMs are idle and make decisions on how similar two memory pages need to be before they’re shared. The effort we put into developing Transparent Page Sharing pays off big for our users with dramatic reductions in per VM memory consumption and minimal performance impact.

sld06a

VMware ESX Server uses exclusive Transparent Page Sharing technology to save a single copy of similar guest OS memory pages.

Our other memory technology is called the balloon driver and it’s part of the VMware Tools you load in each VM. The balloon driver process (vmmemctl) recognizes when a VM is idle and exerts artificial pressure on the guest OS causing it to swap out its memory to disk. The freed up memory is then reclaimed for use by other active VMs.

sld07a

The VMware guest balloon driver frees memory in idle VMs for use by active VMs.

Working together, Transparent Page Sharing and the balloon driver let ESX Server comfortably support memory overcommitment. You can learn more about our memory management technologies in this white paper. Now, getting back to our VM density test, how did ESX Server do? Here’s the screen shot:

VC_40VMs_started

VMware Infrastructure 3 with memory overcommitment supports 40 concurrent VMs!

Those 40 VMs have more than 20GB of total RAM allocated and they are running fine on a server with 4GB of physical RAM – a 5:1 memory overcommit ratio. Our exclusive ability to efficiently overcommit memory lets VMware Infrastructure support more than five times as many VMs on the same hardware as our competition! We repeated the test using Windows 2000 Server VMs running SQLIOSim to see how we fared with heavily loaded VMs. Hyper-V and XenServer both topped out at six and seven VMs again when they hit their memory limits, but the ESX Server platform ran fine with 14 VMs – twice as much as the other hypervisors!

Now, let’s get back to the cost per VM comparison to see which hypervisors provide the most bang for the buck. In the table below, we add up the costs for a basic hypervisor deployment. We’ll assume a 2-way, 4GB server costs us $6,000. Next, we add the costs to run Windows in each VM. For that, we’ll take advantage of Microsoft’s policy that lets us run an unlimited number of Windows VMs on a host licensed with Windows Server Data Center Edition (and yes, that policy also applies to VMware and Xen hosts.) Licensing Windows Server Data Center Edition costs us $5998 for two sockets. After that, we plug in the cost of the VMware Infrastructure 3 licenses, and to make things interesting, we’ll assume the competing hypervisor is absolutely free.

The next row in the table shows how many concurrent 512MB VMs each hypervisor can support. For VI3, we’re assuming a conservative 2:1 memory overcommit ratio based on our heavy workload test, which lets us run 14 VMs. For our hypothetical free hypervisor, we’re stuck at seven VMs because memory overcommit isn’t an option. That’s right, no other hypervisor technology allows memory overcommitment – it’s a VMware exclusive.

Vmware_vs_free_hypervisor_598x309_4 

Finally, we do the division and find that even our high-end VI3 Enterprise bundle beats a free hypervisor in cost per VM! Going with any other hypervisor means you’ll need more hardware, network and storage connections, switch ports, floor space, power and cooling to support a given population of VMs. That should make your decision easy if all you’re doing is simple server consolidation, but there’s more to consider. VI3 Enterprise includes a powerful array of virtual infrastructure services like VMotion, DRS, HA and more that let you automate, optimize and protect your operations, and those features put us far ahead of the offerings from the Xen vendors and Microsoft.

If you’re ready to get started consolidating your servers, don’t be lured by seemingly low cost hypervisors into a decision that will limit your VM density and lock you into spending more on hardware. Instead, put memory overcommitment at the top of your list of hypervisor feature requirements. You’ll spend less on the project by stretching your hardware further and, since only VMware has memory overcommitment, you’ll get the proven reliability and advanced virtualization features of VMware Infrastructure thrown in for free. Beware the high cost of a “free” hypervisor.

[Update: More on VMware Memory Overcommit, for Those Who Don’t Trust the Numbers]