VMware

« September 2009 | Main | November 2009 »

10/16/2009

Delta Disk Support in OVF

OVF packages can become very large when they include several disks. One way to tackle this issue is by using delta disk compression. It is a technique that utilizes the fact that many parts of the disks in an OVF package contain the same data. For example, a collection of VMs in an OVF package will often run the same kind of operating system. Generally speaking, delta disk compression arranges a set of disks in a tree such that components that are equal in child nodes are used in parent nodes. In this way data is only represented once across multiple disks.

Here is a conceptual figure of the virtual disks in a delta disk compressed OVF package and how it looks when it is deployed:

DeltaDiskHierachy0
The OVF package contains two VMs: A Web server and a database. They run the same operating system and delta disk compression has factored out the common parts in a separate parent disk shared by the two VMs. When the OVF package is deployed the tree gets flattened and the Web server and database each get their own copy of the operating system in the parent disk.

In this blog post you will learn all about how to design your OVF package to take advantage of delta disks and how to apply this type of compression to your package using OVF Tool.

  • We start out by looking at a brief example of how a typical OVF package with multiple disks could look like and how delta disk compression would reduce the size of it.
  • Next we look at what delta disk hierarchies are and how they are expressed in the OVF descriptor. This is important to understand what they are to make delta disk compression work.
  • Then we give some advice on how you should construct your OVF package to utilize delta disk compression.
    • Here we also give some tips on how to shrink your virtual disks to reduce the space of your OVF package.
  • Finally, we show how OVF Tool can delta disk compress your OVF package.

Example

In the example we look at a multi-tiered LAMP stack. LAMP is an abbreviation for a software bundle comprising Linux, Apache HTTP Server, MySQL, and PHP. We split the bundle into two VMs: A Web server VM running Linux Apache HTTP Server and PHP serving as a front-end and a database VM running Linux and MySQL as the back-end. Each VM has a single disk which contains all its data.

Let us assume the two VMs run the same Linux OS (for example Ubuntu Server 9). Then much of the data on the two disks would be identical and only the bits concerning the Apache HTTP Server, PHP software, and MySQL would be different. Here is a rough estimate of how much space each component will need when stored on a compressed virtual disk:

  • Ubuntu Server 9: 500 MB.
  • Apache HTTP Server and PHP: 50 MB
  • MySQL: 50 MB
Distributing the Web server VM and database VM without delta disk compression would now take up about 1,100 MB:

SizeOf(Web server) + SizeOf(Database) = (500 MB + 50 MB) + (500 MB + 50 MB) = 1,100 MB

In this blog post we will explain how this space can be reduced using the delta disk feature supported by the OVF specification and OVF Tool. Using delta disk compression we can extract all the components that are equal in the two VMs (the Linux OS part), only keeping one copy of them. This leaves us with an OVF package that only take up about 600 MB of space.

It is, however, not always as simple as applying delta disk compression on your OVF package, since it may not yield any reduced disk space. This is because delta disk compression relies upon how data is distributed on the disks it works on. In the remainder of the blog we will explain what delta disks are, how you can optimize your OVF package to take advantage of delta disk compression, and finally how to apply delta disk compression to your OVF package with OVF Tool.

Technical Details of Delta Disks

A delta disk hierarchy is a tree of disks like in this figure (white areas denote empty disk space):
DeltaDiskHierachy1

In the figure we see a tree with three nodes: Disk1 (root) with red data, Disk 2 with blue data and Disk 3 with green data. A disk element in an OVF descriptor can refer to any of the nodes in the delta disk hierarchy. For instance, if a disk in the OVF descriptor refers to Disk 3 it will essentially get the flattened Disk 3 shown in the lower half of the picture when it is deployed. The deployment semantics of a delta disk node is basically to overlay the nodes in the parent chain (omitting the white space) from the root all the way down to the chosen delta disk node. More concretely, in the example to get the flattened Disk 3, we would first write Disk 1. Then we overwrite this with the contents of Disk 2 (omitting the empty space) and finally with Disk 3 (omitting the empty space).

In the above paragraph we mention empty space. Empty space is simply a segment of a disk with containing zeroes, which be a bit misleading since it may actually used by the VM using the disk. However, for all intents and purposes it does not matter either way we look at it.

In the figure parentRefs annotate the arrows that tie the disks together. This is also what the attribute is called in the OVF descriptor which link Disk elements together and it is used on Disk elements in the DiskSection of the OVF descriptor. This is what the disk section with the three disks could look like:

<DiskSection>
<Info>Meta-information about the virtual disks</Info>
<Disk ovf:capacity="1073741824"
ovf:diskId="disk1"
ovf:fileRef="diskFile1"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
<Disk ovf:capacity="1073741824"
ovf:diskId="disk2"
ovf:fileRef="diskFile2"
ovf:parentRef="disk1"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized"/>
<Disk ovf:capacity="1073741824"
ovf:diskId="disk3"
ovf:fileRef="diskFile3"
ovf:parentRef="disk2"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
</DiskSection>

The LAMP example can be described as this delta disk hierarchy:

DeltaDiskHierachy2
For this setup the disk section could look like this:
<DiskSection>
<Info>Meta-information about the virtual disks</Info>
<Disk ovf:capacity="1073741824"
ovf:diskId="parentDisk"
ovf:fileRef="parentDiskFile"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
<Disk ovf:capacity="1073741824"
ovf:diskId="WebServerDisk"
ovf:fileRef="WebServerDiskFile"
ovf:parentRef="parentDisk"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized"/>
<Disk ovf:capacity="1073741824"
ovf:diskId="DataBaseDisk"
ovf:fileRef="DatabaseDiskFile"
ovf:parentRef="parentDisk"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
</DiskSection>

Preparing an OVF Package for Delta Disk Compression

There are some restrictions of delta disk compression that are important to understand to get the most out of the feature. Firstly, the disks in a delta disk hierarchy must have the same capacity, so if you have two disks in your OVF package with different capacity (for example, one is 4 GB and the other 8 GB) you will not be able to use delta disk compression on the two disks. Secondly, delta disk compression only compares disk content on the same part of the disk at the disk address level. For example, even though the same file is on two different disks but not on the same part of the disk it will not be reduced by delta disk compression. If the file on the first disk is at address 0x00670000 and on the other disk at address 0x02D10000 it will not be detected as a shared block and put in a parent disk – only if it is at the same address (for example, 0x00670000). In other words, for delta disk compression to work there should be a substantial overlap between disks at the disk address level.

The second requirement can be difficult to satisfy if you are not careful in how you construct the OVF package, but there are ways to do it. To explain how, let us first look at the LAMP stack example that we looked at in the beginning of the blog post, to see how we can prepare it for delta disk compression. This LAMP stack had a Linux VM running Apache HTTP Server and PHP and another Linux VM running MySQL. Each VM had a single disk.

To achieve optimal disks for delta disk compression we first create a plain Linux VM with one disk. We clone the VM so we now have two plain Linux VMs. On one of them we install Apache HTTP Server and PHP and on the other we install MySQL. By doing this we satisfy the first criteria that the disks have same capacity (Web server VM’s disk and database VM’s disk come from the same original plain Linux VM) and second criteria that a significant part of the disks overlap each other. The files from the OS part of the plain Linux install are at the same position on both disks, since they were not changed when we installed the Apache HTTP Server, PHP, and MySQL (or at least, the majority of them have not changed).

The above example is rather canonical in how you achieve the best results from delta disk compression when having multiple VMs using the same operating system, so to summarize:

  1. Install a plain operating system in a VM;
  2. Clone the plain VM the number of times you need for your solution;
  3. Install the remaining software specific to each VM.
The reason why we first install a plain operating system and then clone it to the number of VMs we need, rather than installing the same operating system multiple times on each of the VMs we need, is that we cannot in general be sure that the files are put at the exact same location on the disks, even though it is the same operating system.

If cloning is not an option when making the OVF package then perhaps VMware Studio is. It can create VMs well suited for delta disk compression, since it builds the VMs operating system and other software components in a scripted manner that that can be replayed to produce almost identical VMs.

Shrinking the Disks

When you export your VMs in your OVF package you want to make sure that all unused space is zeroed out, since this compresses really well in the VMDK disk format. However, space used by swap disks and deleted files often take up space on disk, since they are not eagerly zeroed out by default by most operating systems. This means that even though your VM says it only uses about 500 MB it may actually take up a lot more space. Even worse, you may have confidential information on, e.g., your swap drive or old deleted files that you do not want to distribute with the OVF package. There are several ways to solve this problem. On most Linux distributions it is possible to do the following things to clean up a disk before you export the VM: 1) Un-mount the swap drive; 2) Write a single file to disk containing only zeroes as large as possible; 3) Delete the file immediately after you created it. On the command line you can do these three steps by invoking these commands:

  1. /sbin/swapoff -a (this will un-mount all swap disks)
  2. dd if=/dev/zero of=zeroFile.tmp
  3. rm zeroFile.tmp
On a Windows system it can be done in various ways. We will consider Windows Server 2008, but it can be applied with modifications on other types of Windows systems.

We start out by installing VMware tools on the Windows Server 2008 VM and when it is installed, open VMware tools and choose “Shrink…”. This will zero out the disk. To zero out the swap disk you need to set an option under Administrative Tools. Go to Administrative Tools -> Local Security Policy -> Security Settings -> Security Options and enable the policy “Shutdown: Clear virtual memory pagefile”. When you shutdown the VM the swap disk will then be zeroed out. Please note, however, that enabling this option will increase the shutdown time significantly for large swap disks. One way of working around this problem could be to first delete the swap disk, reboot the VM and disabling the option again (and hopefully no data is written to the swap disk), and then shutting the VM before putting it in an OVF package.

Creating an OVF Package with Delta Disk Compression using OVF Tool

Up until now we have not explained how to actually construct an OVF package with delta disk compression, only what parts go into it. Even though the ideas behind delta disks can seem a bit complicated it is quite easy to use delta disk compression in your OVF package by using OVF Tool. Basically, you use the option –makeDeltaDisks. Source may be an OVF descriptor, a VMX descriptor, or a VIM source (for example, a VM or vApp in vSphere). Target must be a directory. For example, we can use delta disk compression on our LAMP OVF package by invoking this command:
ovftool --makeDeltaDisks LAMP.ovf output-dir/
This will create a new OVF package in the output directory with delta disk compression. That is, both the disk and the OVF descriptor are updated, which means you can take the OVF package written in the output directory and deploy it immediately without any manual post processing. There are no restrictions to the type of input you give OVF Tool in terms of disks. It will try to create delta disk trees of all the disks in the input OVF package and output the optimal OVF package in terms of delta disk compression.

The disks that OVF Tool generates are compressed in the VMDK virtual disk format, but it is possible to apply a second layer of compression which may yield even smaller disks by using the –compress option. Use –compress=9 for the best compression. On a package the size of the LAMP OVF package (about 600 MB) it would yield about 30-40 MB less disk space (in our experience). Delta disk compressing our LAMP OVF package with this extra option would then simply be done by invoking:

ovftool --makeDeltaDisks -compress=9 LAMP.ovf output-dir/
Learn more about what OVF Tool can do by going to the OVF forum at VMware: http://communities.vmware.com/community/developer/forums/ovf. Here you can ask questions about OVF Tool and related products.

About VMware vApp Blog

  • In this blog, we will dig into the details of OVF, vApps, and virtual appliances and how they can be put to practical use - both by IT administrators and virtual appliance authors. If you are wondering about the technical details and how to apply OVF in practice, this is a good place to learn more.

Subscribe