It is a rare occurrence, but it is possible that you may see windows open and a cursor moving in a lab that you have just deployed in the VMware Hands-on Labs. Before you reach for the End button to kill the lab, in the immortal words of Douglas Adams, “Don’t Panic!”

Does this mean that you have been assigned the same lab that someone else is already using? Based on the way that lab entitlements are managed by the VMware Learning Platform, that is unlikely. Instead, this probably has to do with some new monitoring and proactive failure notification we implemented in a subset of our 2014 labs.

The “Pre-Pop”

As a quick refresher on our deployment architecture, we maintain a few deployed copies (“pre-pops”) of each lab so that we can handle our usual demand and get you up and running in a lab within a minute or two rather than requiring you to wait as a fresh lab is deployed and booted.

When you request a lab, you get the one off the top of the stack and the system automatically backfills by deploying a new one to replace the one you have consumed. Pretty slick, right? If a certain lab unexpectedly becomes insanely popular, it is possible for our pool of pre-pops to be exhausted and you would need to wait for the full deploy/boot/initialize cycle.

The Problem

We’re deploying these labs into a cloud environment where load and contention are unpredictable. This means that some deployments don’t quite come up 100%. Each of our pods is effectively a tiny datacenter that must be booted and brought online. If you had to completely shut down your datacenter, I bet it would take a bit of time to bring online, and you’d likely have to do it in stages — nobody I know just powers everything on at once — with validation performed between each stage.

The same goes for our labs, but the unpredictable loads on the cloud infrastructure can cause variations in the timing. Sometimes, certain components have not been up long enough before other components try to use them: you can’t power up a VM from a storage array that hasn’t finished booting.

Developing a Solution

In an effort to address some of these issues, we have implemented some basic checks, waits, and upstream status reporting into our labs. During this pilot, our team receives notifications of these failed deployments so that we can login and take corrective action. This kind of feedback has been invaluable to our team and had provided the opportunity for us to classify the issues, determine possible root causes, and develop remediations.

Unfortunately, due to the nature of the system, it is possible for the lab we’re fixing to be assigned out to one of our users. Most of the time, we can fix the lab within a few minutes, so we greatly appreciate your patience in not ending the lab. The goal is to have remediations for common issues built into the pods we are developing in 2015 so that these things can heal themselves.

In the meantime, if you see someone poking around in the lab that you just got deployed, don’t panic. We should be finished within a few minutes. If you’d like to fire up a Notepad window and let us know that you’re there, we can give you an idea about how long it might take for the lab to be ready for you. We’d rather catch it before you run into issues an hour into the lab because, for example, the NFS storage didn’t come up quite right.

As always, the best source for feedback is our “desktop tattoo,” which displays the Ready state of your lab on the desktop in the lower right corner. If this says something besides “Ready,” your lab is still initializing and you should give it a few minutes to finish starting.

Thanks for reading and Enjoy your Labs!