Tag Archives: ops

Problem Management with vCenter Operations: Dealing with Events and Incidents Before They Impact Users

By: Pierre Moncassin

In some more traditional IT environments, if you have “problem manager” anywhere near your job title, you are probably faced with formidable challenges.

Let me guess… your mission is to steer the IT infrastructure clear of forthcoming issues – sometimes referred to as root causes – that will lead to incidents. Most of the time, though, you can only see what occurred in the past. To take a page from the famous TV Series, an incident has occurred and detective Columbo is called to the scene. What has occurred, he asks? Is there a pattern? Did anyone notice other incidents occurring around the same time?

That kind of thing you can probably do in your sleep. But however talented a detective you may be, this fact remains: You likely have little visibility into future incidents. You see some clues scattered around (also known as alerts), but these alerts cannot be readily interpreted without hours of manual work.

Fortunately, a tool like vCenter Operations Manager allows you to accelerate the scenario for Problem Management. Think of it as an assistant that can connect all the clues together and link them to potential suspects (root causes). The groundwork is done for you so that you can focus on the truly proactive work.

But vCenter Operations Manager pushes the envelope even further. Proactive analytics can detect impending outages before users are impacted. In detective terms, not only can you identify the suspects faster, you get an advance notice on their next move.

Now enough theory – let’s see how that works in practice.


Fig 1.
First off, let us look at the Health Badge (Fig 1.) which is built in as standard with vCenter Operations Manager. It is a dashboard that can provide you with instant visibility into the current state of the infrastructure. You can not only identify immediate issues but also use proactive capabilities like the risk badge to detect which areas of the infrastructure might fail in the future. In a nutshell: You don’t need to wait for an outage before responding.


Fig 2.
Another way to identify potential issues is by setting up Early Warning Smart Alerts in vCenter Operations Manager. These are alerts designed to tell you that some infrastructure components underpinning your cloud services are not operating “normally”. Unless it’s a traditional incident/response scenario, your overall service may well be operating perfectly fine – but the alert tells you that an issue will soon need attention and gives you a chance to be pro-active about it.

vCenter Operations Manager deploys advanced analytics to determine whether a component is operating within a “norm.” For now, it’s enough to say that once vCOps detects “abnormal” components beyond a certain threshold, an Early Warning Smart Alert is issued. It is the signal for the detective (a.k.a. the Problem Manager) to start investigating.

As soon as a potential issue is identified, you can drill into potential root causes (as shown in Fig. 2, right hand side). It is only a short step then from detection to active prevention and remediation. If the vCenter Configuration Manager (vCM) toolset is also deployed, you can directly access the virtual infrastructure configuration and review what recent change events have occurred. If the issue is related to a known change event within VCM, you may be able to roll back the change with a single command.

In summary, the toolsets not only accelerate detection, they also allow you to take appropriate preventative actions.

Right, but is it always that easy? Not always, of course. There are situations where there are so many alerts triggered (e.g. “Alert Storms”) that the root cause becomes harder to identify. But again, the good news is that there are known ways to cut down the noise – see our earlier blog, “Tips for Using KPIs to Filter Noise with vCenter Operations Manager” for more details.

The bottom line is that if you are a Problem Manager using vCenter Operations Manager, you will see your work increasingly shifting from reactive to proactive tasks. This is because you can let automation do the groundwork. (I digress a little here, but you will find that the same happens across many traditional IT roles when moving to a vCloud Infrastructure. Less time spent on physical-world “nuts and bolts” frees more time for proactive planning. By the way, if you are curious to see how the roles evolve, check out our “Organizing for the Cloud” white paper.)

In conclusion, here are three technical reasons why VMware vCenter Operations Manager will be a game-changer for you:

  • You will accelerate root cause analysis with instant drill-down access into infrastructure issues that may impact your overall services.
  • You get a comprehensive view of the infrastructure situation via visual summaries, like the Health dashboards.
  • Last but not least, you leverage proactive analytics to get an early notice of impending incidents. Now that is something that even detective Columbo did not have.

Follow @VMwareCloudOps on Twitter for future updates, and join the conversation by using the #CloudOps and #SDDC hashtags on Twitter.

Transforming IT Services is More Effective with Org Changes

By: Kevin Lees

Last time, I wrote about the challenge of transforming a traditional IT Ops culture and the value of knowing what you’re up against.

Now I want to suggest some specific organizational changes that – given those cultural barriers – will help you successfully undertake your transformation.

At the heart of the model I’m suggesting is the notion of a Cloud Infrastructure Operation Center of Excellence. What’s key is that it can be adopted even when your org is still grouped into traditional functional silos. 

Aspiration Drives Excellence

A Cloud Infrastructure Operation Center of Excellence is a virtual team comprised of the people occupying your IT org’s core cloud-focused roles: the cloud architect, cloud analyst, cloud developers and cloud administrators. They understand what it means to configure a cloud environment, and how to operate and proactively monitor one. They’re able to identify potential issues and fix them before they impact the service.

Starting out, each of these people can still be based in the existing silos that have grown up within the organization. Initially, you are just identifying specific champions to become virtual members of the Center of Excellence. But they are a team, interacting and meeting on a regular basis, so that from the very beginning they know what’s coming down the pipe in terms of increased capacity or capability of the cloud infrastructure itself, as opposed to demands for individual projects.

Just putting them together isn’t enough, though. We’ve found that it’s essential to make membership of the cloud team an aspirational goal for people within the IT organization. It needs to be a group that people want to be good enough to join and for which they are willing improve their skills. Working with the cloud team needs to be the newest, greatest thing.

Then, as cloud becomes more prominent and the defacto way things are done, the Cloud Center of Excellence can expand and start absorbing pieces of the other functional teams. Eventually, you’ll have broken down the silos, the Cloud Center of Excellence will be the norm for IT, and everybody will be working together as an integrated unit.

Four Steps to Success

Here are four steps that can help ensure that your Cloud Infrastructure Operation Center of Excellence rollout is a success:

Step 1 – Get executive sponsorship

You need an enthusiastic, proactive executive sponsor for this kind of change.  Indeed, that’s your number one get – there has to be an executive involved who completely embraces this idea and the change it requires, and who’s committed to proactively supporting you.

Step 2 – Identify your team  

Next you need to identify the right individuals within the organization to join your Center of Excellence. IT organizations that go to cloud invariably already run a virtualized environment, which means they already employ people who are focused on virtualization. That’s a great starting point for identifying individuals who are best qualified to form the nucleus of this Center. So ask: Who from your existing virtualization team are the best candidates to start picking up responsibility for the cloud software that gets layered on top of the virtualized base?

Step 3 – Identify the key functional teams that your cloud team should interact with.

This is typically pretty easy because your cloud team has been interacting with these functional teams in the context of virtualization. But you need to formalize the conneciton and identify a champion within each of these functional teams to become a virtual member of the Center of Excellence. Very importantly, to make that work, the membership has to be part of that person’s job description. That’s a key piece that’s often missed: it can’t just be on top of their day job, or it will never happen. They have to be directly incentivized to make this successful.

Step 4 – Sell the idea

Your next step is basically marketing. The Center of Excellence and those functional team champions must now turn externally within IT and start educating everybody else – being very transparent about what they’re doing, how it has impacted them, how it will impact others within IT and how it can be a positive change for all. You can do brown bag lunches, or webinars that can be recorded and then downloaded and watched, but you need some kind of communication and marketing effort to start educating the others within IT on the new way of doing things, how it’s been successful, and why it’s good for IT in general to start shifting their mindset to this service orientation.

Don’t Forget Tenant Operations 

There’s one last action you need to put in place to really complete your service orientation: create a team that is exclusively focused outwards toward your IT end customers. It’s what we call Cloud Tenant Operations.

Tennant Ops is one of three Ops tiers that enable effective operations in the cloud era. It is also called “Service Ops,” which is one of three Ops tiers outlined here and here.

One of the most important roles in this team is the customer relationship (or sometimes ‘collaboration’) manager who is directly responsible for working with the lines of business, understanding their goals and needs, and staying in regular contact with them, almost like a salesperson, and supporting that line of business in their on-boarding to, and use of, the cloud environment.

They can also provide demand information back to the Center of Excellence to help with forward capacity planning, helping the cloud team stay ahead of the demand curve by making sure they have the infrastructure in place when the lines of business need it.

Tenant Operations is really the counterpart to the Cloud Infrastructure Operation Center of Excellence from a service perspective – it needs to comprise of someone who owns the services offered out to the end customers over their life cycle, a service architect and service developers who actually can understand the technical implications of the requirements. These requirements are coming from multiple sources, so the team needs to identify the common virtual applications that can be offered out and consumed by multiple organizations (and teams within organizations) as opposed to doing custom one-off virtual application development.

In a sense, Tenant Operations function as the dev ops team from a cloud service perspective and really instantiate the concept of a service mindset, becoming the face to the external end users of the cloud environment.

These Changes are Doable

The bottom line here: transforming IT Ops is doable. I have worked with many IT organizations that are successfully making these changes. You can do it too.

Additional Resources

For a comprehensive look at how to best make the transition to a service-oriented cloud infrastructure, check out Kevin’s white paper, Organizing for the Cloud. 

Also look for VMware Cloud Ops Journey study findings later this month, which highlights common operations capability changes, and the drivers for those changes. For future updates, follow us on Twitter at @VMwareCloudOps, and join the conversation by using the #CloudOps and #SDDC hashtags.