Author Archives: Kristian Bisgaard Lassen

Delta Disk Support in OVF

OVF packages can become very large when they include several disks.
One way to tackle this issue is by using delta disk compression. It is
a technique that utilizes the fact that many parts of the disks in an
OVF package contain the same data. For example, a collection of VMs in
an OVF package will often run the same kind of operating system.
Generally speaking, delta disk compression arranges a set of disks in a
tree such that components that are equal in child nodes are used in
parent nodes. In this way data is only represented once across multiple
disks.

Here is a conceptual figure of the virtual disks in a delta disk compressed OVF package and how it looks when it is deployed:

DeltaDiskHierachy0

The OVF package contains two VMs: A Web server and a database. They run
the same operating system and delta disk compression has factored out
the common parts in a separate parent disk shared by the two VMs. When
the OVF package is deployed the tree gets flattened and the Web server
and database each get their own copy of the operating system in the
parent disk.

In this blog post you will learn all about how to design your
OVF package to take advantage of delta disks and how to apply this type
of compression to your package using OVF Tool.

  • We start out by looking at a brief example of how a typical OVF
    package with multiple disks could look like and how delta disk
    compression would reduce the size of it.
  • Next we look at what delta disk hierarchies are and how they
    are expressed in the OVF descriptor. This is important to understand
    what they are to make delta disk compression work.
  • Then we give some advice on how you should construct your OVF package to utilize delta disk compression.
    • Here we also give some tips on how to shrink your virtual disks to reduce the space of your OVF package.
  • Finally, we show how OVF Tool can delta disk compress your OVF package.

Example

In the example we look at a multi-tiered LAMP stack.
LAMP is an abbreviation for a software bundle comprising Linux, Apache
HTTP Server, MySQL, and PHP. We split the bundle into two VMs: A Web
server VM running Linux Apache HTTP Server and PHP serving as a
front-end and a database VM running Linux and MySQL as the back-end.
Each VM has a single disk which contains all its data.

Let us assume the two VMs run the same Linux OS (for example Ubuntu
Server 9). Then much of the data on the two disks would be identical
and only the bits concerning the Apache HTTP Server, PHP software, and
MySQL would be different. Here is a rough estimate of how much space
each component will need when stored on a compressed virtual disk:

  • Ubuntu Server 9: 500 MB.
  • Apache HTTP Server and PHP: 50 MB
  • MySQL: 50 MB
Distributing the Web server VM and database VM without delta disk compression would now take up about 1,100 MB:

SizeOf(Web server) + SizeOf(Database) = (500 MB + 50 MB) + (500 MB + 50 MB) = 1,100 MB

In this blog post we will explain how this space can be reduced
using the delta disk feature supported by the OVF specification and OVF
Tool. Using delta disk compression we can extract all the components
that are equal in the two VMs (the Linux OS part), only keeping one
copy of them. This leaves us with an OVF package that only take up
about 600 MB of space.

It is, however, not always as simple as applying delta disk
compression on your OVF package, since it may not yield any reduced
disk space. This is because delta disk compression relies upon how data
is distributed on the disks it works on. In the remainder of the blog
we will explain what delta disks are, how you can optimize your OVF
package to take advantage of delta disk compression, and finally how to
apply delta disk compression to your OVF package with OVF Tool.

Technical Details of Delta Disks

A delta disk hierarchy is a tree of disks like in this figure (white areas denote empty disk space):
DeltaDiskHierachy1

In the figure we see a tree with three nodes: Disk1 (root) with red
data, Disk 2 with blue data and Disk 3 with green data. A disk element
in an OVF descriptor can refer to any of the nodes in the delta disk
hierarchy. For instance, if a disk in the OVF descriptor refers to Disk
3 it will essentially get the flattened Disk 3 shown in the lower half
of the picture when it is deployed. The deployment semantics of a delta
disk node is basically to overlay the nodes in the parent chain
(omitting the white space) from the root all the way down to the chosen
delta disk node. More concretely, in the example to get the flattened
Disk 3, we would first write Disk 1. Then we overwrite this with the
contents of Disk 2 (omitting the empty space) and finally with Disk 3
(omitting the empty space).

In the above paragraph we mention empty space. Empty space is
simply a segment of a disk with containing zeroes, which be a bit
misleading since it may actually used by the VM using the disk.
However, for all intents and purposes it does not matter either way we
look at it.

In the figure parentRefs annotate the arrows that tie
the disks together. This is also what the attribute is called in the
OVF descriptor which link Disk elements together and it is used on Disk
elements in the DiskSection of the OVF descriptor. This is what the
disk section with the three disks could look like:

<DiskSection>
<Info>Meta-information about the virtual disks</Info>
<Disk ovf:capacity="1073741824"
ovf:diskId="disk1"
ovf:fileRef="diskFile1"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
<Disk ovf:capacity="1073741824"
ovf:diskId="disk2"
ovf:fileRef="diskFile2"
ovf:parentRef="disk1"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized"/>
<Disk ovf:capacity="1073741824"
ovf:diskId="disk3"
ovf:fileRef="diskFile3"
ovf:parentRef="disk2"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
</DiskSection>

The LAMP example can be described as this delta disk hierarchy:

DeltaDiskHierachy2

For this setup the disk section could look like this:
<DiskSection>
<Info>Meta-information about the virtual disks</Info>
<Disk ovf:capacity="1073741824"
ovf:diskId="parentDisk"
ovf:fileRef="parentDiskFile"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
<Disk ovf:capacity="1073741824"
ovf:diskId="WebServerDisk"
ovf:fileRef="WebServerDiskFile"
ovf:parentRef="parentDisk"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized"/>
<Disk ovf:capacity="1073741824"
ovf:diskId="DataBaseDisk"
ovf:fileRef="DatabaseDiskFile"
ovf:parentRef="parentDisk"
ovf:format="http://www.vmware.com/interfaces/specifications/vmdk.html#streamOptimized" />
</DiskSection>

Preparing an OVF Package for Delta Disk Compression

There are some restrictions of delta disk compression that are
important to understand to get the most out of the feature. Firstly,
the disks in a delta disk hierarchy must have the same capacity, so if
you have two disks in your OVF package with different capacity (for
example, one is 4 GB and the other 8 GB) you will not be able to use
delta disk compression on the two disks. Secondly, delta disk
compression only compares disk content on the same part of the disk at
the disk address level. For example, even though the same file is on
two different disks but not on the same part of the disk it will not be
reduced by delta disk compression. If the file on the first disk is at
address 0x00670000 and on the other disk at address 0x02D10000 it will
not be detected as a shared block and put in a parent disk – only if it
is at the same address (for example, 0x00670000). In other words, for
delta disk compression to work there should be a substantial overlap
between disks at the disk address level.

The second requirement can be difficult to satisfy if you
are not careful in how you construct the OVF package, but there are
ways to do it. To explain how, let us first look at the LAMP stack
example that we looked at in the beginning of the blog post, to see how
we can prepare it for delta disk compression. This LAMP stack had a
Linux VM running Apache HTTP Server and PHP and another Linux VM
running MySQL. Each VM had a single disk.

To achieve optimal disks for delta disk compression we first
create a plain Linux VM with one disk. We clone the VM so we now have
two plain Linux VMs. On one of them we install Apache HTTP Server and
PHP and on the other we install MySQL. By doing this we satisfy the
first criteria that the disks have same capacity (Web server VM’s disk
and database VM’s disk come from the same original plain Linux VM) and
second criteria that a significant part of the disks overlap each
other. The files from the OS part of the plain Linux install are at the
same position on both disks, since they were not changed when we
installed the Apache HTTP Server, PHP, and MySQL (or at least, the
majority of them have not changed).

The above example is rather canonical in how you achieve the
best results from delta disk compression when having multiple VMs using
the same operating system, so to summarize:

  1. Install a plain operating system in a VM;
  2. Clone the plain VM the number of times you need for your solution;
  3. Install the remaining software specific to each VM.
The reason why we first install a plain operating system and then
clone it to the number of VMs we need, rather than installing the same
operating system multiple times on each of the VMs we need, is that we
cannot in general be sure that the files are put at the exact same
location on the disks, even though it is the same operating system.

If cloning is not an option when making the OVF package then
perhaps VMware Studio is. It can create VMs well suited for delta disk
compression, since it builds the VMs operating system and other
software components in a scripted manner that that can be replayed to
produce almost identical VMs.

Shrinking the Disks

When you export your VMs in your OVF package you want to make sure
that all unused space is zeroed out, since this compresses really well
in the VMDK disk format. However, space used by swap disks and deleted
files often take up space on disk, since they are not eagerly zeroed
out by default by most operating systems. This means that even though
your VM says it only uses about 500 MB it may actually take up a lot
more space. Even worse, you may have confidential information on, e.g.,
your swap drive or old deleted files that you do not want to distribute
with the OVF package. There are several ways to solve this problem. On
most Linux distributions it is possible to do the following things to
clean up a disk before you export the VM: 1) Un-mount the swap drive;
2) Write a single file to disk containing only zeroes as large as
possible; 3) Delete the file immediately after you created it. On the
command line you can do these three steps by invoking these commands:

  1. /sbin/swapoff -a (this will un-mount all swap disks)
  2. dd if=/dev/zero of=zeroFile.tmp
  3. rm zeroFile.tmp
On a Windows system it can be done in various ways. We will consider
Windows Server 2008, but it can be applied with modifications on other
types of Windows systems.

We start out by installing VMware tools on the Windows
Server 2008 VM and when it is installed, open VMware tools and choose
“Shrink…”. This will zero out the disk. To zero out the swap disk you
need to set an option under Administrative Tools. Go to Administrative
Tools -> Local Security Policy -> Security Settings ->
Security Options and enable the policy “Shutdown: Clear virtual memory
pagefile”. When you shutdown the VM the swap disk will then be zeroed
out. Please note, however, that enabling this option will increase the
shutdown time significantly for large swap disks. One way of working
around this problem could be to first delete the swap disk, reboot the
VM and disabling the option again (and hopefully no data is written to
the swap disk), and then shutting the VM before putting it in an
OVF package.

Creating an OVF Package with Delta Disk Compression using OVF Tool

Up until now we have not explained how to actually construct an OVF
package with delta disk compression, only what parts go into it. Even
though the ideas behind delta disks can seem a bit complicated it is
quite easy to use delta disk compression in your OVF package by using
OVF Tool. Basically, you use the option –makeDeltaDisks. Source may be
an OVF descriptor, a VMX descriptor, or a VIM source (for example, a VM
or vApp in vSphere). Target must be a directory. For example, we can
use delta disk compression on our LAMP OVF package by invoking this
command:
ovftool --makeDeltaDisks LAMP.ovf output-dir/
This
will create a new OVF package in the output directory with delta disk
compression. That is, both the disk and the OVF descriptor are updated,
which means you can take the OVF package written in the output
directory and deploy it immediately without any manual post processing. There are no restrictions to the type of input you give OVF Tool in
terms of disks. It will try to create delta disk trees of all the disks
in the input OVF package and output the optimal OVF package in terms of
delta disk compression.

The disks that OVF Tool generates are compressed in the
VMDK virtual disk format, but it is possible to apply a second layer of
compression which may yield even smaller disks by using the –compress
option. Use –compress=9 for the best compression. On a package the size
of the LAMP OVF package (about 600 MB) it would yield about 30-40 MB
less disk space (in our experience). Delta disk compressing our LAMP
OVF package with this extra option would then simply be done by
invoking:

ovftool --makeDeltaDisks -compress=9 LAMP.ovf output-dir/
Learn more about what OVF Tool can do by going to the OVF forum at VMware: http://communities.vmware.com/community/developer/forums/ovf. Here you can ask questions about OVF Tool and related products.

OVF Localization

OVF packages are ideal for distributing complex software packages. As we saw in the post on "Inside the OVF Package",
it is possible to tailor the same OVF package for different hypervisors
using deployment options. Moreover, it is possible to add meaningful
human readable descriptions to the different parts of the OVF
descriptor, in places like product information, EULA sections and
deployment options. This enables deployment wizards (like the vSphere
client) to give the user a great experience when deploying an OVF
package, since it can use specific messages relevant to a particular
OVF descriptor. For example, the user can see messages about the
intention of deployment options and properties, and how to use them.
This blog post is about what to do about all that human readable
metadata, when you want to distribute your OVF package in different
countries and for different languages.

The OVF specification includes an internationalization section that
describes how to localize an OVF descriptor. This lets you address the
issue of translating all the human readable metadata in the OVF
descriptor without having to keep multiple copies of the descriptor,
each localized to a specific language. Obviously it does not make sense
to localize everything in an OVF descriptor, since some information is only
intended to be read by a machine. A rule of thumb is that you can localize all the elements that carry some kind of metadata which
are useful to the deployer of the OVF package but not needed by the
deployment platform. For a full list of elements that can be localized
please see the list at the end of this blog post.

Example

Let us take a look at how the user experiences an OVF package that has been localized:

OVF deployment using default language.

Ovfdeploy_english 

OVF deployment using German localization.

Ovfdeploy_german

The vSphere Client attempts to use the localized messages from the OVF
package which matches the locale of the users Windows installation. If
a matching localization is not found, the default language of the OVF
descriptor (English in the example) is used.

Localizing the OVF Descriptor

Let’s look at how to write an OVF descriptor that is localized to
multiple language including both English and German as in the above
example, but also Danish and Swahili:

<Envelope xml:lang="en-US">
  …
  <VirtualSystem ovf:id="MyVM">
    …
    <ProductSection>
      <Info>Information about the installed software</Info>
      <Property ovf:key="num_connections" ovf:type="string"
        ovf:userConfigurable="true">
        <Label ovf:msgid="num_connections.label">
          Number of connections
        </Label>
      </Property>
      <Property ovf:key="admin_address" ovf:type="string"
        ovf:userConfigurable="true">
        <Label ovf:msgid="admin_address.label">Administrator address</Label>
        <Description ovf:msgid="admin_address.description">
      Email address of the systems administrator
    </Description>
      </Property>
    </ProductSection>
    …
  </VirtualSystem>
 
  <!– German localized messages –>
  <Strings xml:lang="de-DE">
    <Msg ovf:msgid="num_connections.label">Zahl der Anschlüsse</Msg>
    <Msg ovf:msgid="admin_address.label">Verwalteradresse</Msg>
    <Msg ovf:msgid="admin_address.description">
      Email address des Systemverwalters
    </Msg>
  </Strings>
 
   <!– Danish localized messages –>
    <Strings xml:lang="da-DK">
    <Msg ovf:msgid="num_connections.label">Antal forbindelser</Msg>
    <Msg ovf:msgid="admin_address.label">Administrator adresse</Msg>
    <Msg ovf:msgid="admin_address.description">
      System administratorens email-adresse
    </Msg>
  </Strings>
 
   <!– Swahili localized messages –>
    <Strings xml:lang="sw">
    <Msg ovf:msgid="num_connections.label">Idadi ya connections</Msg>
    <Msg ovf:msgid="admin_address.label">Administrator anwani</Msg>
    <Msg ovf:msgid="admin_address.description">
      Barua pepe ya system administrator
    </Msg>
  </Strings>
</Envelope>

In this example, we are localizing the label of the "num_connections" property and the label and description of the "admin_address" property.

As you can see, it is pretty straight forward to localize an OVF descriptor to support multiple languages:

  1. First prepare the OVF descriptor for localization by adding an
    ovf:msgid attribute to each of the elements you want to be localized
    (see the list of possible elements in the last section of this blog entry)
    and give the ovf:msgid a unique value;
  2. Next you add a Strings section for each of the locales you
    want to support and specify the locale by using the xml:lang attribute;
  3. For each ovf:msgid attribute in the OVF descriptor create a Msg element in the Strings section with the same ovf:msgid value.

The human readable messages in the example ProductSection element
are used as the default language and will be used if no appropriate
locale is available in the descriptor. To specify the default language
of the OVF descriptor (the language used in the default messages), set
the xml:lang attribute on the top Envelope element level. In our
example we have set it to US English (en-US).

The format of the locale is standard and is written as
[language]-[country]-[variant]. It is not necessary to specify the full
locale for a string bundle. For example, if we simply specify the
German locale in the example as de it can then be used for several
German speaking countries such as Austria (de-AT), Germany (de-DE),
Luxembourg (de-LU) and parts of Switzerland (de-CH). If a user uses locale X on his computer, we select the locale in the OVF descriptor
which matches the beginning or all of X. If two locales match X we
chose the one that gives the longest match. This is why changing the
locale from de-DE to de means that the localization can be used on the
locales de-AT, de-DE, de-LU and de-CH, since de is a prefix of those.

External String Bundles

Often when managing multiple locales it is impractical to keep
them in the same file. For this purpose the OVF specification allows
you to extract all the Strings elements and put them in separate files.
From the example OVF descriptor we have used, we can take all the three
different localizations and put them in separate files (we will call these
files string bundles):

<Envelope>
  <References>
    <File ovf:href="MyVM-de-DE.msg" ovf:id="msgs1"/>
    <File ovf:href="MyVM-da-dk.msg" ovf:id="msgs2"/>
    <File ovf:href="MyVM-sw.msg" ovf:id="msgs3"/>
    <File ovf:href="MyVM-disk1.vmdk" ovf:id="file2" ovf:size="9580544"/>
    …
  </References>
  …
</Envelope>

The files are then referenced in the beginning of the Reference section
in the OVF descriptor (important: String bundles must be listed first
in the reference section):

File: MyVM-de-DE.msg

<!– German localized messages –>
<Strings xml:lang="de-DE">
  <Msg ovf:msgid="num_connections.label">Zahl der Anschlüsse</Msg>
  <Msg ovf:msgid="admin_address.label">Verwalteradresse</Msg>
  <Msg ovf:msgid="admin_address.description">
    Email address des Systemverwalters
  </Msg>

</Strings>
 

File: MyVM-da-DK.msg

<!– Danish localized messages –>
<Strings xml:lang="da-DK">
  <Msg ovf:msgid="num_connections.label">Antal forbindelser</Msg>
  <Msg ovf:msgid="admin_address.label">Administrator adresse</Msg>
  <Msg ovf:msgid="admin_address.description">
      System administratorens email-adresse
  </Msg>

</Strings>
 

File: MyVM-sw.msg

<!– Swahili localized messages –>
<Strings xml:lang="sw">
  <Msg ovf:msgid="num_connections.label">Idadi ya connections</Msg>
  <Msg ovf:msgid="admin_address.label">Administrator anwani</Msg>
  <Msg ovf:msgid="admin_address.description">
    Barua pepe ya system administrator
  </Msg>

</Strings>

In the above example we created three files, one for each locale. The OVF
specification allows multiple Strings elements in the same file next to
the OVF descriptor, so it is not necessary to create a file per locale
as we did. However, by keeping the locales in separate
string bundles it become easy to extend the supported locales simply by
adding more string bundle files.

Further Reading

To learn more about localization, check out section in the OVF 1.0.0 specification: http://www.dmtf.org/standards/published_documents/DSP0243_1.0.0.pdf

List of Localizable Elements

The text in the following elements can be localized:
  • Info element on VirtualSystem and VirtualSystemCollection
  • Name element on VirtualSystem and VirtualSystemCollection
  • Info element on AnnotationSection, DeploymentOptionSection,
    DiskSection, EulaSection, InstallSection, NetworkSection,
    OperatingSystemSection, ProductSection, ResourceAllocationSection,
    StartupSection and VirtualHardwareSection.
  • Annotation element on AnnotationSection
  • License element on EulaSection
  • Description element on NetworkSection
  • Description element on OperatingSystemSection
  • Description, Product, Vendor, Label, and Category elements on ProductSection
  • Description and Label elements on DeploymentOptionSection
  • ElementName, Caption and Description sub-elements on the System element in VirtualHardwareSection
  • ElementName, Caption and Description sub-elements on Item elements in VirtualHardwareSection
  • ElementName, Caption and Description sub-elements on Item elements in ResourceAllocationSection