I just came back from a few days vacation to find multiple emails in my inbox from VMware customers and partners looking for a response to a series of bizarrely rambling posts (here, here, here and here) on Microsoft's virtualization team blog. Normally, we'd avoid a tit for tat exchange, but the Microsoft postings contained some confusing and erroneous depictions of VMware technology that I hope to address and correct here.
Hypervisor Disk Footprints
We've consistently taken the position that a smaller hypervisor is inherently better and we've found that most people agree with us, including Microsoft's Technical Fellow, Mark Russinovich (see his presentation from Burton Group's Catalyst Conference in July.) The reasoning is that every line of code unavoidably adds reliability and security risks. Microsoft has cited those same benefits of "smaller attack surface" from code size reduction as the motivation for their slimmed down Server Core and Hyper-V Server alternatives. We don't know how many lines of code are in a Hyper-V system, so we use the installed disk footprint -- the size of the installed files needed to support virtual machines -- as a reasonable proxy for lines of code. In calculating hypervisor disk footprints we need to follow a few rules to ensure consistency:
- The installation must be sufficient to run VMs and support all advertised features.
- Any operating systems in management partitions, Dom0s or service consoles needed by the hypervisor should be included.
- Management of the hypervisor can be from a remote client, so local management clients can be excluded.
- Pagefiles, swap partitions, scratch/temp partitions and core dump partitions can be excluded.
In the case of ESXi, here's the sequence we followed to calculate its disk footprint:
- Start by installing ESXi 4 on a bare server.
- Use the vSphere Tech Support Mode to display the contents of the ESXi boot images in the /bootbank directory.
- A df -h command will then show you that the total size of those compressed ESXi boot images in the directory corresponding to /bootbank is 59.3MB -- somewhat less than the 70MB figure we've publicly stated. The other partitions in the listing are either loaded only in memory (/), or they are excluded per the rules above. Note that this is not just a stripped down ESXi installation, it is a fully capable ESXi host supporting all licensed vSphere features.
The disk footprints we measured for Hyper-V R2 RTM are far larger. Windows 2008 R2 Server Core with the Hyper-V role enabled, was 3.6GB. For those Hyper-V users that want to preserve the "Windows they know," a full Windows Server 2008 R2 installation is pushing 10GB. (The Hyper-V Server R2 RTM is not available to us yet, but we expect that its footprint will fall between that of Server Core R2 and full Windows Server 2008 R2, as it did for R1.) For the graphically inclined, here's a comparison that shows just how much less "surface area" ESXi presents to bugs and attacks.
Yes, ESX "Classic" does use a Linux-based service console and therefore has a larger disk footprint, but VMware has publicly stated that the OS-free ESXi architecture is our future direction and ESXi has all the capabilities of ESX "Classic". Microsoft has made no such commitments to eliminate Hyper-V's dependency on Windows. In fact, Microsoft CEO Steven Ballmer has stated that, "Our view is that virtualization is something that should be built into the operating system." Not very encouraging for those hoping to see Hyper-V decoupled from the Windows monolith.
Microsoft's explanation in their blog that, "our entire footprint which is made up mostly of stuff that isn't exposed to VM traffic at all or only exposed indirectly," isn't something I'd want to boast about and it's exactly the thinking we wanted to get away from with the ESXi architecture. We made ESXi exclusively dedicated to VM support and it doesn't bring along the baggage of a general purpose OS. Why would you want your hypervisor to be dependent on the proper functioning and security of tens of millions of lines of code that have nothing to do with supporting your VMs?
Microsoft's blog then moves on to argue that patching is somehow less of a burden with Hyper-V because the aggregate size of its patches are less than for ESXi. I'll give them credit for creativity in coming up with that argument, but it's really meaningless. Because ESXi is installed and patched like an appliance -- the entire image is replaced as a whole -- our patches are naturally the size of the full ESXi installer package. Our customers prefer that appliance approach because it ensures consistency in the their installations and avoids "patch drift" away from a validated configuration. With the Windows Update-based patching used for Hyper-V, patches can be smaller, but customers can skip or miss patches, resulting in insecure, partially patched configurations.
What really matters in patching is the number and disruptiveness of the patches. With ESXi, we've dramatically cut down on the number of patches customers need to download and apply. The biggest reason for the reduction is the elimination of the Linux-based service console. The more frequent rate of ESX "Classic" patches is mainly due to our approach of playing it safe for cus
tomers by distributing patches for any issues affecting the Linux-based service console, even though most of those patches aren't needed by customers because the Linux services addressed by the patches are normally disabled in an ESX installation. Also, we do patch ESX "Classic" incrementally using the "surgical fix" approach with smaller patches that Microsoft seems to advocate.
With both ESX and ESXi, a host reboot following patching has always been non-issue because VMotion and Maintenance Mode make it trivial to shift VMs to alternate hosts during the reboots. Microsoft's customers must certainly be looking forward to using those same features in the long-awaited release of Hyper-V R2.
However, what must really be frustrating to Hyper-V users is the need to constantly patch and reboot Hyper-V hosts with miscellaneous Windows Server 2008 patches that have nothing to do with virtualization. Even if you use the stripped-down "Server Core" version of Windows Server 2008 that Microsoft recommends for production Hyper-V system, you're almost guaranteed to need a host reboot every "Patch Tuesday." We've kept track of the "Patch Tuesday" patches required on a Server Core Hyper-V system since Hyper-V first shipped in June 2008 and there have been multiple "Important" or "Critical" patches to apply almost every month. Most of those patches don't apply to Hyper-V, but users must still install them and then reboot their hosts. And, as users are painfully aware, Hyper-V R1's missing live migration support has meant downtime for their VMs with each reboot. The downtime may lessen with Hyper-V R2, but the patches won't.
|Patch Tuesday||Jul 2008||Aug 2008||Sep 2008||Oct 2008||Nov 2008||Dec 2008||Jan 2009||Feb 2009||Mar 2009||Apr 2009||May 2009||Jun 2009||Jul 2009||Aug 2009|
|"Important" Server Core Patches||2||5||4||4||5||4||1||0||2||5||0||3||0||5|
|Patches affecting Hyper-V||0||0||1||4||0||0||0||0||1||0||0||0||0||0|
|Server Core Reboot Required?||Yes||Yes||Yes||Yes||Yes||Yes||Yes||No||Yes||Yes||No||Yes||No||Yes|
The Hyper-V patching situation really points out the need to keep the hypervisor free of dependencies on a general-purpose OS. Microsoft tried to reduce the OS dependencies with the stripped down Server Core concept, but the numbers above clearly show they didn't improve life for their customers. For VMware customers, the truly thin ESXi architecture means no such extraneous patching and rebooting is needed. If Microsoft has discarded their familiar GUI with Hyper-V Server and Windows Server Core, why do they persist in making Hyper-V dependent on Windows, especially after VMware ESXi has demonstrated that a hypervisor has no technical need for a general-purpose OS? We can only surmise that Microsoft is trying to extend their Windows franchise with an edict to their business units that all servers must be built on top of Windows. It's just too bad their Hyper-V users have to suffer the inconveniences and risks of that OS dependency.
Bugs and Vulnerabilities
Microsoft's blog then gleefully brings up last year's ESX 3.5 Update 2 timebomb issue in an effort to find fault in our patching process. The timebomb bug was a major goof in our release process, we were mortified and we rightly took our lumps, but it had nothing to do with our standard patching process. Microsoft's description of the bug as causing two days of downtime for our users is just plain wrong. Powered on VMs kept right on running when the timebomb activated and we had a patch out in less than 24 hours that, when used together with VMotion and Maintenance Mode, meant few production users suffered any downtime. Despite the facts, I'm sure Microsoft will remind us of that episode many more times to come.
The residents at Microsoft's glass house had some other stones to toss our way. Microsoft pointed to CVE-2009-1244 as an example of a guest breakout vulnerability in ESX and ESXi. A guest breakout exploit is serious business, but, once again, Microsoft is misrepresenting the facts. VMware responded quickly to patch that vulnerability in our products, and ESX was much less affected than Microsoft would lead you to believe:
- The exploit was purely theoretical for generally available versions of ESX and ESXi.
- No working exploit was ever provided for any version of ESX and ESXi. The reporters only showed an example of the exploit on VMware Workstation. They claimed a pre-release version of ESX 4.0 to be vulnerable, but no exploit was demonstrated.
- The exploit reporters have since acknowledged that released versions of ESX 4.0 and ESXi 4.0 are not vulnerable.
The truth is, vulnerabilities and exploits will never c
ompletely go away for any enterprise software, but ESX has been remarkably resistant to such issues. If it happens again, we'll find the problem and fix it quickly, as we did for CVE-2009-1244. I'll also point out that a guest breakout is a much more serious issue when it drops you into a familiar general-purpose management OS like the Windows Server 2008 parent OS used by Hyper-V than it is with a design like ESXi where an escape grants access to just a thin, hardened hypervisor like our vmkernel. A hypervisor that relies on an OS like Win2008 with a history of regular and recurring remote vulnerabilities will always make an easier target for attackers and should not inspire false confidence.
In response to Microsoft's discussion of their own security practices, I should also point out that VMware has a Secure Software Development Lifecycle (SSDL) as well as a Product Security Policy (PSP) in place. We invite our customers (and Microsoft bloggers) to learn all about it later this month at VMworld 2009 session TA2543 that we've dedicated to the topic. There's really no need for the petty sniping by Microsoft on this topic. Both VMware and Microsoft have rigorous security development processes and both ESX and Hyper-V have achieved the demanding Common Criteria EAL 4 level of certification (actually level 4+ for ESX.)
The last potshot from Microsoft's blog was directed at VMware's policy of requiring review of performance benchmarks of VMware products prior to publication. Microsoft claims we're trying to control and distort the truth. I've explained the rationale behind our benchmark policy previously in this blog. VMware is in no way trying to restrict publication of valid performance data. In fact, we have approved plenty of benchmarks that show other products leading in tests. Our friends at Citrix have submitted multiple test results that we have approved. We'd be happy to do the same for Microsoft, but they have yet to make a request. I wonder why?
OK, that's enough attention paid to the Microsoft blog. I hope to see you all at VMworld shortly where we can all discuss where VMware is going -- with SpringSource and our Cloud initiatives, it's getting pretty exciting around here!