Hey, have you seen my Windows 2012 template?  Which one, you ask?  Simple, I just mean the one with SP1.  No, not .NET SP1…  I mean the one with WINDOWS SP1, and just plain ol’ .NET 4.5.  What?  NO, not the one with IE 11!  Seriously, you know none of our apps work past IE 8.  Wait, which version of Java??  WHO runs that version!?  You didn’t put that into the MASTER template, DID YOU!!??

When template sprawl gets out of hand, maintaining all the variations and keeping track of all the things that are in this template but not in that template can turn into a real mess.  Fortunately, as vSphere Admins, we are responsible for keeping order; it’s our duty to rein things in every now and again.  My goal for this article is to pass along a few tips to help you maintain some control over your VM templates.

Less is more

The first order of business in controlling your templates is to minimize the total number you need in your environment.  Ideally, the goal here is get down to one template for each OS version you support.  Sound challenging?  Well think of it this way: once you’ve created a template, even if its creation process was fairly easy, you then have to maintain it – that’s years of patches, software upgrades, managing configuration drift, etc.  Not to mention the fact that successive updates can easily introduce mysterious problems and quirks into a template.  When you take all of these factors into consideration, the ROI on the table is compelling enough to justify the challenge of re-working how things are done.

The methods for getting there, however, will vary depending on what tools you have at your disposal.  Specifically, I’m referring to whether or not your organization uses a server automation tool.  Examples of such tools include Puppet, SCCM, and BladeLogic, to name just a few.

Path A: Templates with lots of content

First, let’s start with the more basic scenario in which you’re not using a server automation tool.  For this scenario, you’ll want to create templates that meet the needs of as many of your applications as possible on a given OS, with a target of 100%.  This means you should include all your service packs, .NET, Java, agents, VMware Tools, and other content, to the maximum version that is compatible with all the apps that need to run on that OS.   This also means you’ll want to leave out anything that isn’t compatible with all your apps for that OS.  Instead of including these less-compatible items in additional templates, install them along with the specific apps that require them.  The work of an extra install for a handful of application servers will most likely pale in comparison to the years of maintenance required by each additional template.

If your Windows Server applications require app-specific OS settings or Registry keys, you can leverage GPO’s to apply them post-cloning, and leave them out of your template.  Another option is to create a few different Customization Specifications, which can allow joining different AD domains or leveraging multiple SysPrep answer files for additional settings like time zone.  Customization Specifications can also contain different RunOnce commands, which can be nice for simple tasks like running a quick agent install.  When using RunOnce, you’ll just want to keep in mind that vCenter doesn’t wait for the RunOnce commands to complete before reporting back that the VM clone operation has finished.  This means you’ll need to manually check the console of the VM if you want to make sure your RunOnce commands have finished successfully and without errors.

And last but not least, you can leverage your people skills.  For example, if you have 95 apps that are OK with SP1, and 5 apps that don’t support SP1, then convince the app vendors (and the app owners within your organization) to get them upgraded!  Hey I didn’t say this was going to be easy!

Path B: Shifting content from Templates to Automation

Are the solutions and examples I offered in the previous approach not scalable enough for your environment?  Are your applications with unique requirements on the scale of tens or hundreds of servers, as opposed to the handful I mentioned previously?  Is RunOnce’s basic, freewheeling execution of commands unable to handle all the extra scripts and logic you need to keep your templates non-app specific?  If so, it sounds like you’re dealing with a large-scale environment, with a lot of diversity.  With any luck, your organization is already using a server automation tool to help manage it.

Bringing a server automation tool into the mix doesn’t change the goal of one template per OS.  Is does, however, add a lot of additional capabilities, which will change your approach a bit.  Such a tool should be able to reliably deliver your installations in script-based automation packages (referred to by various names, depending on the tool).  It should also be able to handle complex deployment logic, error handling, and be able to house and replicate all of your install files across multiple environments or data centers.  A powerful automation tool can be used to manage your entire post-template deployment process.  Don’t get me wrong – you’ll still use GPO’s and Customization Specifications, but the majority of your content and installations can be moved into the server automation tool.

In addition to moving post-template content, much of the template content I described in previous approach in Path A, above, can also be moved into the server automation tool.  In this scenario, the template contains just the bare minimum OS and agents –often referred to as a “vanilla” template.  The advantage to shifting content out of the template comes when you need to update an agent or other piece of software within your deployment: it’s a lot easier to change one file or script line in the server automation tool than it is to crack open the whole template.  This approach not only reduces the effort to upgrade individual components, but also reduces the frequency of your template updates – sometimes drastically.

Typically, contents of vanilla templates are limited to:

  • The latest VMware Tools
  • The most recent OS service packs and patches that your organization has approved for general rollout
  • Your server automation tool’s agent, for deploying all the subsequent installations

I’ll explain why each of these items should be included in a vanilla template.  First, you’ll want the latest VMware Tools in there for drivers and because troubleshooting any post-template deployment issues would be painful without them.  Also, some of your server automation tool’s scripts may require them to be present.  OS Service packs are there because they take a very long time to install, and they blow up every now and again, so its best to bake them in before anything else gets installed.  Patches are there for security, so that you’re not bringing a completely unpatched system onto the network during the deployment process.   And finally, you need your server automation tool’s agent so that you can deploy all of your post-template content to the VM.

Temporal Sprawl

One of the concepts I just touched on was OS patching.  Often when I mention this to customers, the next question that comes up is: how often should we be cracking open our templates to apply new patches?  Well, if your organization’s security policy requires that no computer can be on the network without the latest approved patches, even for just a few minutes while deploying a new VM, then the answer is simple: update every month.  Fortunately, such policies usually also have a grace period of two or three weeks to allow time for regression testing the updates.

If, on the other hand, your organization’s security policy is OK with newly deployed VM’s being powered on for a little while with less than the latest approved patch level, then you have some flexibility.  The simplest option is to take advantage of mechanisms within your patching or configuration management tools to ensure that newly deployed VM’s will be patched shortly after coming online.  If you are using a server automation tool to manage your overall post-template deployment process, this can be inserted into a specific point in process, ahead of other, less-critical installations.

Even if your security team doesn’t require you to patch your templates every month, one way or another you’re still going to be on the hook for updating templates at least once or twice a year.  This means that over time you’ll be releasing multiple versions of each OS template.  I’ve seen a lot of customers handle new version releases simply by converting a template to a VM, making updates, and then converting it back to a template.  I would caution against this approach.  Every so often, problems with an OS function or an application will crop up shortly after a template update, which will raise suspicions of the new template being the cause.  Regardless of whether it truly is the root cause of the problem or not, you’re going to need access to the previous version of the template for testing, so be sure to archive these somewhere.  If you can provide whoever is doing the troubleshooting with both versions of the template, then you can either allay their suspicions, or simply revert back to the known-good template.

Since you’re going to be in the business of keeping and managing multiple versions of templates, I would recommend putting some thought into your versioning/release process.  For starters, I would suggest having a version number for each template, and include that number in your template naming standard.  For example, your naming standard could be something like <os>-<os_ver>-XX.YY, which would look something like Win-2008R2-01.00.  I typically update the number to the right of the decimal for minor changes like security patches, and update the major version number to the left for more significant changes like an OS service pack or a new VM Hardware version.  Also, if you have any intention (current or future) of using scripts to help manage your templates, you’ll make things a lot more script-friendly if you consistently use all of the digits in your version numbers.  I’ve seen more than one script produce bad results because X.10 sorts before X.9, so don’t forget the extra 0 (X.09)!

Another suggestion that I make is to use some sort of automated process to install all the OS/content when you create your templates.  This doesn’t have to be an expensive server automation tool like we discussed above.  In fact, free OS-included deployment tools like Microsoft Deployment Toolkit (MDT) or Kickstart will work just fine for this purpose.  The benefits of this approach are twofold: 1) to simplify the update process and 2) to remove manual steps from the process for greater consistency and user confidence.  Need to upgrade an agent in a template?  Simple – just update the file or command in your OS deployment tool, and rerun the deployment to create a fresh, artifact-free template.  Is a panicked co-worker accusing you of fat fingered a step in the update process?  Easily debunk the accusation by showing them that the process is automated.  Heck, with a couple of clicks you can completely recreate the template for them.  Using an OS deployment tool also makes documenting the contents of each template version a snap – in a lot of cases you can copy/paste text descriptions or take screenshots in the tool for your documentation.  And lastly, if you’re doing a lot of template updates (monthly, for example), I also recommend archiving your past OS deployments along with your past templates, and tracking them both with the same version number.

You got your Templates in my vCloud!

Like chocolate and peanut butter, they’re better together!  If you’re using vCloud Automation Center (vCAC) as your IaaS platform, then you’ll want to factor that into your template versioning/release process.  When vCAC manages an organization’s server request process, it also becomes the de facto point of control for templates.  Whether or not your users can request a VM based on a particular template or Customization Specification depends on whether or not a service architect has specified them in some of the blueprints that make up the request catalog.  This puts vCAC in a great position to support your template versioning/release process.  If a template is in the midst of being updated or tested, you can make sure it’s not in any published/entitled blueprints until its ready.  Once it’s ready, you can swap it into a blueprint with just a few clicks.  Or if you have a problem with a recently updated template, you can swap back the old one just as easily.

Beyond just controlling when new templates are available for users to request, vCAC blueprints and their associated workflows also specify your post-template deployment (among other things!) by calling out to Customization Specifications and other systems such as a server automation tool.  If you’ve created blueprints for specific types of application servers, it will also include the subsequent application installations.  With all of these components having to work together to build the user’s requested VM, some organizations decide that it makes sense to treat them as a single entity.  Whenever they update one of the components, they test and certify the entire VM build process, end-to-end.  While each component should have a version number, and the entire entity of composite components should have a collective version number.  I like to record this version number as a custom property within the blueprint.  I also like to document the versions of the composite components as blueprint custom properties too, so that I have all the version numbers stamped into every VM in vCAC.  That’s great info to have handy for scripts or when you’re troubleshooting an issue and need to know a VM’s makeup.

As of v6.0.1, vCAC blueprints reference templates by name, so whenever you create an updated template with a new version number, you’ll need to update the blueprint to reflect the new name.  Also, while vCenter will let you create templates with the same name, vCAC blueprints expect templates to have unique names.  Requests for a new VM will error out if its blueprint references a template name that could refer to 2 different objects in the same Datacenter within vCenter.

Since vCAC environments often span multiple vCenters/clouds, let’s talk briefly about replicating templates between environments.  vCloud Connector has a Content Sync feature that allows you to sync content catalogs, which can include templates, via a publisher/subscriber model.  You can also sync catalogs with your public/hybrid cloud environments.  As of v2.6 the basic and advanced editions have been merged into one free tool.

And finally, since we’re talking about replicating templates across environments, I would be remiss if I didn’t also touch on where templates should live within each individual environment.  The best practice within each vCenter is to isolate templates to a datastore dedicated to that purpose.  This splits the I/O during cloning between the source and destination datastores, and protects the running VMs from high-I/O cloning operations.  While it’s OK to have a running VM on that datastore as part of your template creation process, you should generally avoid having any other VM’s running on that datastore.

Now go get ‘em!

Template sprawl is a problem that starts off innocently enough – we just want to take advantage of a feature to make our jobs easier.  Unfortunately, it tends to creep up on us and get out of control over time.  But take heart – with a little forethought, we can take control of our template destiny, and put an end to the sprawl.