The Case for Upstream Remediation: The Third Pillar of Effective Patch Management for Cloud Computing

By: Pierre Moncassin

Patch Management fulfills an essential function in IT operations: it keeps your multiple software layers up to date, as free of vulnerabilities as possible, and consistent with vendor guidelines.

But scale that to an ever-dynamic environment like a VMware-based cloud infrastructure, and you have an extra challenge on your hands. Not only do the patches keep coming, but end users keep provisioning and amending their configuration. So how to keep track of all these layers of software?

In my experience there are three pillars that need to come together to support effective patch management in the Cloud. The first two, policy and automation, are fairly well established. But I want to make a case for a third: upstream remediation.

As a starting point, you need a solid patching policy. This may sound obvious, but the devil is in the details. Such a policy needs to be defined and agreed across a broad spectrum of stakeholders, starting with the security team. This is typically more of a technical document than a high-level security policy, and it’s far more detailed than, say, a simple rule of thumb (e.g. ‘you must apply the latest patch within X days’).

A well-written policy must account for details such as exceptions (e.g. how to remedy non-compliant configurations); security tiers (which may have different patching requirements); reporting; scheduling of patch deployment, and more.

The second pillar is Automation for Patch Management. While the need for a patching policy is clearly not specific to Cloud Infrastructure, its importance is magnified in an environment where configurations evolve rapidly and automation is pervasive. And such automation would obviously make little sense without a well-defined policy. For this, you can use a tool like VMware’s vCenter Configuration Manager (VCM).

VCM handles three key aspects of patching automation:

Reporting – i.e. verifying patch levels on selected groups of machines
Checking for bulleting updates on vendor sites (e.g. Microsoft)
Applying patches via automated installation

In a nutshell, VCM will automate both the detection and remediation of most patching issues.

However, one other key step is easily overlooked – upstream remediation. In a cloud infrastructure, we want to remediate not just the ‘live’ configurations, but also the templates used for provisioning. This will ensure that the future configurations being provisioned are also compliant. Before the ‘cloud’ era, administrators who identified a patching issue might make a note to update their standard builds in the near future – but there would rarely be a critical urgency. In cloud environments where new machines might be provisioned say, every few seconds, this sort of updates need to happen much faster.

As part of completing any remediation, you also need to be sure to initiate a procedure to carry out updates to your blueprints, as well as to your live workloads (see the simplified process view above).

You need to remember, though, that remediating the images will depend on different criteria from the ‘live’ workload and, depending on the risk, may require a change request and related approval. You need to update the images, test that the updates are working, and then close out the change request.

In sum, this approach reflects a consistent theme across Cloud Operations processes: that the focus of activity is shifted upstream towards the demand side. This also applies to Patch Management: remediation needs to be extended to apply upstream to the provisioning blueprints (i.e. images).

Key takeaways:

Policy and automation are two well-understood pillars of patch management;
A less well-recognized third pillar is upstream remediation;
Upstream remediation addresses the compliance and quality of future configurations;
This reflects a common theme in Cloud Ops processes: that focus shifts to the demand side.

Follow @VMwareCloudOps and @Moncassin on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Related Articles

Three Types of Code, a DevOps Model

Why you should consider adding serverless compute capability to existing applications

Use an Outcome-Focused Gap Analysis: A Framework for Application and Cloud Modernization Part 2