Home > Blogs > VMware vApp Developer Blog > Monthly Archives: October 2009

Monthly Archives: October 2009

Delta Disk Support in OVF

OVF packages can become very large when they include several disks.
One way to tackle this issue is by using delta disk compression. It is
a technique that utilizes the fact that many parts of the disks in an
OVF package contain the same data. For example, a collection of VMs in
an OVF package will often run the same kind of operating system.
Generally speaking, delta disk compression arranges a set of disks in a
tree such that components that are equal in child nodes are used in
parent nodes. In this way data is only represented once across multiple
disks.

Here is a conceptual figure of the virtual disks in a delta disk compressed OVF package and how it looks when it is deployed:

DeltaDiskHierachy0
The OVF package contains two VMs: A Web server and a database. They run
the same operating system and delta disk compression has factored out
the common parts in a separate parent disk shared by the two VMs. When
the OVF package is deployed the tree gets flattened and the Web server
and database each get their own copy of the operating system in the
parent disk.

In this blog post you will learn all about how to design your
OVF package to take advantage of delta disks and how to apply this type
of compression to your package using OVF Tool.

  • We start out by looking at a brief example of how a typical OVF
    package with multiple disks could look like and how delta disk
    compression would reduce the size of it.
  • Next we look at what delta disk hierarchies are and how they
    are expressed in the OVF descriptor. This is important to understand
    what they are to make delta disk compression work.
  • Then we give some advice on how you should construct your OVF package to utilize delta disk compression.
    • Here we also give some tips on how to shrink your virtual disks to reduce the space of your OVF package.
  • Finally, we show how OVF Tool can delta disk compress your OVF package.

Example

In the example we look at a multi-tiered LAMP stack.
LAMP is an abbreviation for a software bundle comprising Linux, Apache
HTTP Server, MySQL, and PHP. We split the bundle into two VMs: A Web
server VM running Linux Apache HTTP Server and PHP serving as a
front-end and a database VM running Linux and MySQL as the back-end.
Each VM has a single disk which contains all its data.

Let us assume the two VMs run the same Linux OS (for example Ubuntu
Server 9). Then much of the data on the two disks would be identical
and only the bits concerning the Apache HTTP Server, PHP software, and
MySQL would be different. Here is a rough estimate of how much space
each component will need when stored on a compressed virtual disk:

  • Ubuntu Server 9: 500 MB.
  • Apache HTTP Server and PHP: 50 MB
  • MySQL: 50 MB
Distributing the Web server VM and database VM without delta disk compression would now take up about 1,100 MB:

SizeOf(Web server) + SizeOf(Database) = (500 MB + 50 MB) + (500 MB + 50 MB) = 1,100 MB

In this blog post we will explain how this space can be reduced
using the delta disk feature supported by the OVF specification and OVF
Tool. Using delta disk compression we can extract all the components
that are equal in the two VMs (the Linux OS part), only keeping one
copy of them. This leaves us with an OVF package that only take up
about 600 MB of space.

It is, however, not always as simple as applying delta disk
compression on your OVF package, since it may not yield any reduced
disk space. This is because delta disk compression relies upon how data
is distributed on the disks it works on. In the remainder of the blog
we will explain what delta disks are, how you can optimize your OVF
package to take advantage of delta disk compression, and finally how to
apply delta disk compression to your OVF package with OVF Tool.

Technical Details of Delta Disks

A delta disk hierarchy is a tree of disks like in this figure (white areas denote empty disk space):
DeltaDiskHierachy1

In the figure we see a tree with three nodes: Disk1 (root) with red
data, Disk 2 with blue data and Disk 3 with green data. A disk element
in an OVF descriptor can refer to any of the nodes in the delta disk
hierarchy. For instance, if a disk in the OVF descriptor refers to Disk
3 it will essentially get the flattened Disk 3 shown in the lower half
of the picture when it is deployed. The deployment semantics of a delta
disk node is basically to overlay the nodes in the parent chain
(omitting the white space) from the root all the way down to the chosen
delta disk node. More concretely, in the example to get the flattened
Disk 3, we would first write Disk 1. Then we overwrite this with the
contents of Disk 2 (omitting the empty space) and finally with Disk 3
(omitting the empty space).

In the above paragraph we mention empty space. Empty space is
simply a segment of a disk with containing zeroes, which be a bit
misleading since it may actually used by the VM using the disk.
However, for all intents and purposes it does not matter either way we
look at it.

In the figure parentRefs annotate the arrows that tie
the disks together. This is also what the attribute is called in the
OVF descriptor which link Disk elements together and it is used on Disk
elements in the DiskSection of the OVF descriptor. This is what the
disk section with the three disks could look like:

<DiskSection>
<Info>Meta-information about the virtual disks</Info>
<Disk ovf:capacity="1073741824"
ovf:diskId="disk1"
ovf:fileRef="diskFile1"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
<Disk ovf:capacity="1073741824"
ovf:diskId="disk2"
ovf:fileRef="diskFile2"
ovf:parentRef="disk1"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized"/>
<Disk ovf:capacity="1073741824"
ovf:diskId="disk3"
ovf:fileRef="diskFile3"
ovf:parentRef="disk2"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
</DiskSection>

The LAMP example can be described as this delta disk hierarchy:

DeltaDiskHierachy2
For this setup the disk section could look like this:
<DiskSection>
<Info>Meta-information about the virtual disks</Info>
<Disk ovf:capacity="1073741824"
ovf:diskId="parentDisk"
ovf:fileRef="parentDiskFile"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
<Disk ovf:capacity="1073741824"
ovf:diskId="WebServerDisk"
ovf:fileRef="WebServerDiskFile"
ovf:parentRef="parentDisk"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized"/>
<Disk ovf:capacity="1073741824"
ovf:diskId="DataBaseDisk"
ovf:fileRef="DatabaseDiskFile"
ovf:parentRef="parentDisk"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
</DiskSection>

Preparing an OVF Package for Delta Disk Compression

There are some restrictions of delta disk compression that are
important to understand to get the most out of the feature. Firstly,
the disks in a delta disk hierarchy must have the same capacity, so if
you have two disks in your OVF package with different capacity (for
example, one is 4 GB and the other 8 GB) you will not be able to use
delta disk compression on the two disks. Secondly, delta disk
compression only compares disk content on the same part of the disk at
the disk address level. For example, even though the same file is on
two different disks but not on the same part of the disk it will not be
reduced by delta disk compression. If the file on the first disk is at
address 0x00670000 and on the other disk at address 0x02D10000 it will
not be detected as a shared block and put in a parent disk – only if it
is at the same address (for example, 0x00670000). In other words, for
delta disk compression to work there should be a substantial overlap
between disks at the disk address level.

The second requirement can be difficult to satisfy if you
are not careful in how you construct the OVF package, but there are
ways to do it. To explain how, let us first look at the LAMP stack
example that we looked at in the beginning of the blog post, to see how
we can prepare it for delta disk compression. This LAMP stack had a
Linux VM running Apache HTTP Server and PHP and another Linux VM
running MySQL. Each VM had a single disk.

To achieve optimal disks for delta disk compression we first
create a plain Linux VM with one disk. We clone the VM so we now have
two plain Linux VMs. On one of them we install Apache HTTP Server and
PHP and on the other we install MySQL. By doing this we satisfy the
first criteria that the disks have same capacity (Web server VM’s disk
and database VM’s disk come from the same original plain Linux VM) and
second criteria that a significant part of the disks overlap each
other. The files from the OS part of the plain Linux install are at the
same position on both disks, since they were not changed when we
installed the Apache HTTP Server, PHP, and MySQL (or at least, the
majority of them have not changed).

The above example is rather canonical in how you achieve the
best results from delta disk compression when having multiple VMs using
the same operating system, so to summarize:

  1. Install a plain operating system in a VM;
  2. Clone the plain VM the number of times you need for your solution;
  3. Install the remaining software specific to each VM.
The reason why we first install a plain operating system and then
clone it to the number of VMs we need, rather than installing the same
operating system multiple times on each of the VMs we need, is that we
cannot in general be sure that the files are put at the exact same
location on the disks, even though it is the same operating system.

If cloning is not an option when making the OVF package then
perhaps VMware Studio is. It can create VMs well suited for delta disk
compression, since it builds the VMs operating system and other
software components in a scripted manner that that can be replayed to
produce almost identical VMs.

Shrinking the Disks

When you export your VMs in your OVF package you want to make sure
that all unused space is zeroed out, since this compresses really well
in the VMDK disk format. However, space used by swap disks and deleted
files often take up space on disk, since they are not eagerly zeroed
out by default by most operating systems. This means that even though
your VM says it only uses about 500 MB it may actually take up a lot
more space. Even worse, you may have confidential information on, e.g.,
your swap drive or old deleted files that you do not want to distribute
with the OVF package. There are several ways to solve this problem. On
most Linux distributions it is possible to do the following things to
clean up a disk before you export the VM: 1) Un-mount the swap drive;
2) Write a single file to disk containing only zeroes as large as
possible; 3) Delete the file immediately after you created it. On the
command line you can do these three steps by invoking these commands:

  1. /sbin/swapoff -a (this will un-mount all swap disks)
  2. dd if=/dev/zero of=zeroFile.tmp
  3. rm zeroFile.tmp
On a Windows system it can be done in various ways. We will consider
Windows Server 2008, but it can be applied with modifications on other
types of Windows systems.

We start out by installing VMware tools on the Windows
Server 2008 VM and when it is installed, open VMware tools and choose
“Shrink…”. This will zero out the disk. To zero out the swap disk you
need to set an option under Administrative Tools. Go to Administrative
Tools -> Local Security Policy -> Security Settings ->
Security Options and enable the policy “Shutdown: Clear virtual memory
pagefile”. When you shutdown the VM the swap disk will then be zeroed
out. Please note, however, that enabling this option will increase the
shutdown time significantly for large swap disks. One way of working
around this problem could be to first delete the swap disk, reboot the
VM and disabling the option again (and hopefully no data is written to
the swap disk), and then shutting the VM before putting it in an
OVF package.

Creating an OVF Package with Delta Disk Compression using OVF Tool

Up until now we have not explained how to actually construct an OVF
package with delta disk compression, only what parts go into it. Even
though the ideas behind delta disks can seem a bit complicated it is
quite easy to use delta disk compression in your OVF package by using
OVF Tool. Basically, you use the option –makeDeltaDisks. Source may be
an OVF descriptor, a VMX descriptor, or a VIM source (for example, a VM
or vApp in vSphere). Target must be a directory. For example, we can
use delta disk compression on our LAMP OVF package by invoking this
command:
ovftool --makeDeltaDisks LAMP.ovf output-dir/
This
will create a new OVF package in the output directory with delta disk
compression. That is, both the disk and the OVF descriptor are updated,
which means you can take the OVF package written in the output
directory and deploy it immediately without any manual post processing. There are no restrictions to the type of input you give OVF Tool in
terms of disks. It will try to create delta disk trees of all the disks
in the input OVF package and output the optimal OVF package in terms of
delta disk compression.

The disks that OVF Tool generates are compressed in the
VMDK virtual disk format, but it is possible to apply a second layer of
compression which may yield even smaller disks by using the –compress
option. Use –compress=9 for the best compression. On a package the size
of the LAMP OVF package (about 600 MB) it would yield about 30-40 MB
less disk space (in our experience). Delta disk compressing our LAMP
OVF package with this extra option would then simply be done by
invoking:

ovftool --makeDeltaDisks -compress=9 LAMP.ovf output-dir/
Learn more about what OVF Tool can do by going to the OVF forum at VMware: http://communities.vmware.com/community/developer/forums/ovf. Here you can ask questions about OVF Tool and related products.