Product Announcements

Auto Deploy Stateless Caching – Auto Deploy Server Outage

Continuing my blog series on auto deploy stateless caching.  So far I’ve covered how stateless caching works, how stateless caching works with network isolated hosts, and how stateless caching can help protect against PXE component failures. Continuing on, lets now look at the role stateless caching plays in protecting against an auto deploy server outage.

The Auto Deploy server has two components: the rules engine and the web server.

(1) Rules engine: parses the host attributes to identify the image profile, host profile, and vCenter location (cluster or folder).  The rules are created ahead of time by the administrator using PowerCLI.

(2)  Web server: copies the ESXi image profile to the host along with a copy of the host profile definition that will be used to configure the host after it connects to the vCenter server.

When you install the auto deploy server (or enable it in the vCenter Server Appliance (VCSA)) both these components are installed and configured together, there’s not much you need to be concerned with as far as setup and configuration.  As such I won’t go into any more detail about these components here.  I just want to point out that although we typically view the auto deploy server as a single entity, under the covers it’s really two things – the rules engine and the web server (aka waiter).

In the event there is an outage affecting the auto deploy server the network boot will fail causing the host to fall back to booting from the disk, which will have a copy of the ESXi image profile that was cached/saved during the last successful reboot.  The image below shows an example of a host booting from the cached disk in the face of an auto deploy server outage.

In this case the results are very similar to those discussed in my last post about PXE component failures.  The host will fail to complete the PXE boot,  which will cause it to fall back to booting from the cached disk image.   Once the host has booted from the disk, and assuming the vCenter server is online and the host has connectivity to the vCenter server, the admin can then manually re-connect the host.

Note: Auto Deploy stateless caching does protect against an Auto Deploy server outage assuming the host can connect to vCenter.  Manual admin intervention is required.

Booting a host from the cached disk image in the face of an auto deploy server outage comes with the same cautions as was mentioned for dealing with a PXE component failure:

  1. The hosts BIOS must be set to network boot first and fall back to booting from disk if the network boot fails.
  2. Reconnecting an auto deploy host that has booted off the cached disk image is a manual step.  Admin intervention is required to reconnect the host.
  3. When the host reconnects vCenter will detect that the host has booted from the cached disk image and will flag the host as having not booted stateless.  This will prevent the host from ever becoming compliant with the host profile.  The only way to clear this flag and bring the host back into compliance is to resolve the outage and reboot the host.
  4. Once manually re-connected to vCenter the host will successfully rejoin the cluster and you will be able to host virtual machines.  However, remember, the host will be flagged as having booted from the disk image and a reboot will be required to clear this flag.

Next up we’ll look at how stateless caching works in the face of a vCenter server outage.

For notification on future posts follow me on twitter @VMwareESXi.