Behind the scenes of VMware’s Hands-on Labs, there is a lot of “care and feeding” of the lab infrastructure, manuals, and the lab environments themselves. You might be surprised at how much opportunity there is for things to stop working, even in environments that have been precisely configured and then effectively frozen in time. For example, SSL certificates and password expiration policies are the bane of my existence…
On a call at the end of 2013, I demonstrated some of my maintenance workflows to others on my team and they thought that at least some of our readers would be interested in what I am doing behind the scenes. So, I am kicking off a short series of posts to outline some of the challenges we have and solutions we have developed with the hope that someone out there finds them useful, or at the very least, interesting.
In this introductory post, I will cover some of what we do for both the public HOL and for the VMworld shows and call out some similarities and differences. In general, the differences are the capacities and levels of redundancy involved, but the general model is the same. To set the proper context, I typically work as a cloud consumer rather than a cloud administrator, so I don’t usually talk about the hardware we’re running. 🙂
It is no secret that we use our own vCloud Suite to host the hands-on labs. Based on that, I spend a great deal of time these days using vCloud Director (vCD). We are currently using version 5.1 with plans to upgrade to 5.5 in the near future. For those wondering, we are not using any “special” builds of vCD, but the same off-the-shelf code that any vCD customer could be using.
Our labs are composed of two main components: the lab manual and the lab environment. These components are brought together using a custom web portal called Project NEE (Next Generation Education Environment). I work mainly with the lab environments, which are vCD vApp templates we call vPods. Each vPod is developed by teams of product experts in the Hands-on Labs development cloud. Once development has been completed, the vPods must be exported from the development cloud and migrated to one or more of our hosting clouds used to run the labs for the public HOL or various events. The number and location of these hosting clouds varies and depends on the defined performance, scalability, and availability requirements… and cost.
Large Events (VMworlds, Partner Exchange, etc.)
VMworld US has historically been our showcase for providing customers hands-on experience with our latest products. Due to the massive popularity and scale, running the labs at this large event requires high degrees of performance, scalability and redundancy. We typically have a larger-than-normal budget, dedicated infrastructure and crank everything up past 11. It is not uncommon for us to have vendors loan us their latest and greatest gear to support the labs and showcase its performance for our insanely stressful workloads. In 2013, we had some pre-release EMC XTremIO storage serving our labs, and it was awesome.
Don’t forget our DR plan. If we lose access to one of our datacenters and everything goes down, that’s a pretty public and visible issue. An hour of downtime during an event that lasts a week has a pretty large impact, so we want to avoid that. We’re not perfect, but we try to identify and eliminate all single points of failure and have at least two of everything.
The VMworld Europe is a smaller event and requires less capacity and scalability, but has the same availability and performance requirements. What is unique for that event is that we aim to have the primary cloud source located “nearby” in order to keep latency low and maximize user experience. Eliminating transatlantic WAN trips is a good way to do that, so we try to run the labs out of a datacenter in Europe for that event.
Public Hands-on Labs Portal
At the other end of the spectrum, the public HOL site is our free service which we run on standard, shared infrastructure that is part of our private cloud. We do our best to stay on top of the availability, capacity and performance of this offering, but things do happen. At times, it may take a little longer to access the labs, or we may have outages that render them inaccessible for a bit. Availability and support is on a “best effort” basis, but since we don’t charge anything for this service, our users tend to be very understanding.
With that bit of background out of the way, you understand a bit of why we have multiple clouds. Unfortunately, with multiple clouds comes additional complexity. In order to start a vPod in a given cloud, the template for that vPod must exist in that particular cloud. Some of our challenges are keeping track of which versions of which vPods are present in which clouds, which templates are authoritative, and then resolving any differences.
These labs and vPods are not static entities: software versions are updated, bugs are fixed, and lab flow is tweaked on a regular basis — a lot of that occurs in response to feedback from our vast community of HOL users (THANK YOU!). We read all of the feedback from the post-lab surveys and try to stay on top of the postings on the VMware HOL Community. Any change to a vPod requires that a new version of the pod be spawned and replicated around to any of the clouds hosting that vPod.
In my next (shorter!) post, I will look at how we leverage PowerCLI to verify that the same versions are present in each of the multiple clouds we use to host our labs. In future posts, I will cover how we efficiently and securely replicate the vPods between sites and talk a little about some fun we have with Windows VMs running in clouds backed by different hardware.