posted

0 Comments

With vRealize Operations 8.1 we delivered some great updates to the alerting framework to cut through alert noise, simplify alert creation, improve notifications and make it easier to manage alert assignments. If you have spent any time at all with monitoring solutions, you know that alert fatigue is real. With these updates, you will see how alerts can work for you instead of the other way around. So, join me for a walk through of what’s new for alerts in vRealize Operations 8.1 – I think you will like what we have done!

Introducing Alert Ranking to Combat Alert Fatigue

I mentioned “alert fatigue” in the intro, and it is a real problem. Even with a well-tuned monitoring tool you can quickly accumulate hundreds of alerts of various criticality levels. But which alerts are truly important?

For example, in the screen shot above you can see I have got a lot of active alerts for virtual machines in our lab. Criticality is determined by the alert definition and symptom settings, but a criticality of “warning” on a production workload is likely more important than a “critical” alert for a non-production workload.

Additionally, some alerts are impacting risk versus health. In vRealize Operations risk is a category that indicates a pro-active measure should be taken to prevent problems. Health on the other hand, indicates an immediate problem that needs to be addressed.

It can be daunting to try and figure out where to focus attention. A virtual machine with “PROD” in the name has a warning for risk due to old snapshots, but we can see another virtual machine has a critical health alert for CPU usage at 100%. Which do I tackle first?

With vRealize Operations 8.1, we solve that problem with a new AI-powered capability to assign an “Importance” score to alerts so you can more clearly understand where you should focus attention. Notice the last column in the screen shot below. With the alerts sorted by Importance you can quickly see which may be more important to review. How does this work?

The Alert Importance is derived by giving higher priority to:

  • Rare alerts, which occur with lower frequency and across fewer objects within the topology
  • Alerts on frequently accessed objects

So, what makes an alert “rare”? Periodically, vRealize Operations evaluates alert history for each alert definition and compares them to all other alerts triggered. If an alert definition results in alerts that:

  • Have a longer duration
  • Triggered on more objects

then they are given a lower importance score. The idea is that these alert definitions generally result in noise and there is either some problem with the alert definition (it is too broad or unreasonable) or the policy settings for the environment need to be better refined (for example, alerts that should only be of interest for production versus dev/test).

Higher ranking is also given if the alert triggers on a frequently accessed object. This is a gauge of user interest in each object as compared to all monitored objects in vRealize Operations. This can be clicks, searches, widget interactions and so forth. Objects with more accesses increase the “user interest” and thus contributes to a higher Alert Importance score.

Let us look at those alerts again with the “Importance” scores visible in vRealize Operations 8.1.

Much better, isn’t it? Using AI, vRealize Operations filters the alert noise for you so you can quickly focus on the important alerts first. This saves you time and helps you deliver better service to your business.  And it eliminates alert fatigue.

Easier Alert User Assignment

Now that you know which alerts have the most importance, wouldn’t it be useful to assign an alert to some other user for action? Alert ownership has been a feature of vRealize Operations for many years. Users could review alerts and take ownership but could not assign the alerts to others. That changes in version 8.1 with the new “Assign to…” option on the alert action menu. Search for a user and select them to make the assignment. You can still take ownership of alerts as well.

Of course, you can also release ownership for anyone assigned.

New Alert Definition Workflow

With each release, we strive to make vRealize Operations easier to use. This time the focus on usability is with Alert Definition management. We have a new, simplified workflow and a couple of updates to alert notification rules and recommendations that will make your life easier.

First, we have made it easier to view existing alert definitions to see the symptoms, recommendations, notification rules and policy assignments. Just double-click on an Alert Definition to see the details.

The process to create (or edit) an Alert Definition has been totally refactored to make the flow much more logical and intuitive. You will follow a step-by-step guide to add symptoms, recommendations, assign to policies and apply notification rules.

The editor shows you where alert components may already be used, as shown in the screen shot above. This is helpful to avoid creating duplicate alerts.

Notice that you have the option to create new symptoms if needed. This is also the case with the other components, for example recommendations as shown below.

And once you create the new component it is listed first so you do not have to search for it. Just drag it over to the recommendations priority list.

Policies are important to effective alert management. It helps reduce alert noise by limiting alert triggers to objects that need them, such as risk alerts for production systems. Or PCI compliance alerts for applications that processes credit card transactions.

With the new editor, you can now manage which policies apply to the alert definition as you create it, rather than going into the policy afterwards.

Finish up with adding notification rules and you have created a new Alert Definition. I personally find this workflow much easier and more intuitive and I think you will, too.

New Multi-Definition Notifications

Previously, you could create an alert notification and assign a single alert definition so that any alerts based on that definition would trigger the notification. However, that was a bit limiting and required multiple instances of a notification if you wanted to use it with other alert definitions.

Now in vRealize Operations 8.1 you can assign multiple alert definitions to a single notification rule. This will greatly simplify your notification rule management.

In the screen shot above, I have a notification rule with five alert definitions associated. This makes it a lot easier to route alerts to the right group. Or, if you are using the vRealize Orchestrator Management Pack this can be very helpful to launch specific workflows based on a set of alert definitions.

Did You Say, “Slack Notification?”

If you have a sharp eye, you may have noticed the rule is using a new outbound plugin for Slack. Many customers use Slack to communicate internally, as we do here at VMware. Having alert notifications sent to a specific Slack channel helps you organize responses to alerts and discuss remediation activities with the right team.

To get started, you will need to set up an incoming webhook in Slack for your channel. Once you have the URL, you can create an outbound adapter instance (Administration > Manage > Outbound Plugins) and test the URL.

The Test Webhook URL is not saved with the Slack Plugin instance. Instead, I draw your attention to the previous screen shot of the Notification Rule. When the Slack plugin is selected, you will be prompted for the incoming webhook URL you created for the Slack channel where your alert notifications will be posted.

And here is what the alert notification looks like in Slack.

The notification includes a handy link directly to the alert in vRealize Operations. How neat is that?

Fight Alert Fatigue with vRealize Operations 8.1

I think you will agree, this release has changed the game and tamed alert fatigue. Experience the power of Self-Driving Operations powered by AI for yourself by downloading a trial of vRealize Operations 8.1 or sign up for a trial of vRealize Operations Cloud for the same capability with faster time to value.