vSphere 5.1 introduced a new Auto Deploy mode referred to as Stateless Caching and I’m finding there is a good deal of confusion about the benefits of stateless caching, particularly when it comes to it’s role in protecting against PXE infrastructure and vCenter server outages.  As this topic is a bit too much for a single blog post I’m going to try and address it with a series of posts.  Today I’ll start by providing some background on vSphere Auto Deploy along with an overview of how stateless caching works.

Auto Deploy Stateless Mode

Auto Deploy was first introduced with vSphere 5.0.  Auto deploy is commonly referred to as being a “stateless” or “diskless” architecture because vSphere host that boot using auto deploy do not use a boot disk.  Instead, the hosts boot over the network using PXE where they load ESXi from an auto deploy server.  For more information on how auto deploy works please check out this short video available on the VMware TechPubs YouTube Channel.

Auto deploy running in the stateless mode has many benefits.  Because there is no requirement for a dedicated boot disk there is an immediate cost savings.  In addition, when using SAN storage, auto deploy can drastically simplify the storage architecture by eliminating the need to configure and zone boot LUNs for each host.   Along with the cost savings and simplified configuration, auto deploy also makes it very easy to quickly provision new hosts.  Adding a new host to a cluster is as easy as putting a server in a rack, connecting it to the network and turning it on.  The new host will PXE boot over the network where the auto deploy server will install ESXi and automatically connect the host to vCenter where a host profile is then used to configure the host.

It’s important to note that vSphere hosts configured for auto deploy have several dependencies.  They rely on PXE to network boot, they rely on the auto deploy server to load the ESXi image profile and they rely on the vCenter sever for their configuration.  If there were ever an outage affecting any of these components you would be unable to (re)boot the stateless hosts until the outage is resolved.  This is where stateless caching comes in.

Auto Deploy Stateless Caching Mode

To help address concerns around the reliance on PXE, VMware added a new “Stateless Caching” option to auto deploy with vSphere 5.1.  With stateless caching each host is assigned a dedicated boot disk.  This boot disk can be a local disk, a LUN, or a USB/SD.  The boot disk then serves as a backup boot device that can be used to boot the host in the event of a PXE boot failure.

Under normal conditions a host configured for stateless caching will use PXE each time the host boots, just like a stateless host.  However, where the difference comes is following each successful PXE boot, and after the host profile has been applied by vCenter, with stateless caching the ESXi image running in memory then gets saved to the boot disk.  This way in the future, if there is ever a problem that prevents the host from successfully PXE booting the host is able to fall back to booting from the cached image on the disk.   In other words, stateless caching gives you many of the benefits of stateless mode, but with the added assurance that you will always be able to reboot a host.

While stateless caching does provide a level of protection against PXE outages it also has some drawbacks.  These drawbacks are primarily related to the fact that it actually negates some of the benefits associated with running a diskless architecture, primarily the fact that you still have the added cost and complexity associated with having to allocate a boot disk to each host.   When weighing the pros and cons of stateless caching keep this in mind as you may determine that the limited protection offered by stateless caching may not be worth the added expense and complexity associated with the need to provision a local boot disk for each host.

In addition to losing some of the benefits associated with a diskless architecture, stateless caching also has some limitations when it comes to the type of outage scenarios it is able to protect against.  This is where I’m seeing a lot of the confusion.  Many users see stateless caching as a solution that can be used to protect against network failures, PXE infrastructure failures, auto deploy server failures and even vCenter server failures.   While it can provide some limited protection against PXE failures and an Auto Deploy Server failure, it is not a solution that can protects against network outages or vCenter Server failures.  Over the next few blogs I will break each of these failure scenarios down and show how Stateless caching may or may not help for each.

Stay tuned…

For notification on future posts please follow me on twitter @VMwareESXi.

About the Author

Kyle Gleed

Kyle Gleed is a Group Manager within VMware’s Integrated Systems Business Unit (ISBU) where he leads a team focused on the adoption and deployment of the solutions and capabilities of the Software-Defined Data Center. Follow Kyle on twitter @Kyle_Gleed