There are many datacenter monitoring tools out there that tell you that you HAVE a problem (warning! warning! alert!). At one point in the past that was unique but our expectations and needs have evolved. Today a handful of tools are able to predict problems, before they happen, and tell you how to fix them (remediate them) before they impact your applications. Prediction and suggested remediation are the new minimum expectation. However there are very few tools out there that will not only predict and suggest remediation but automate it. What does that mean exactly?
Think about an employee / manager analogy where the employee keeps bringing common problems to the manager, asking the manager what to do. No manager wants to hear about the same problems over and over again when the fix is already known. What would the manager do? The answer might be to write a short script that the employee can execute, on their own, to solve the problem.
In terms of cloud management, you need a tool that knows what can go wrong in your cloud infrastructure and automatically solve it for you, before it impacts applications and, yes, even without notifying you about it in the middle of the night. What you want out of an cloud monitoring / management tool is to, let’s say, email you a report in the morning that say “I fixed these problems for you while you were sleeping and the applications were never impacted”.
You may not know already (but might be guessing by now, hence the name of the post) that vRealize Operations Manager can provide automated remediation. I wanted to learn about what vROps automated remediation could do and how it’s configured. Here’s what I found out…
vRealize Operations Automated Remediation – Configuration
You’ve actually been able to automatically remediate through vRealize Operations for some time, but the catch was that you had to use vRealize Automation or Orchestrator to do so. This required some extra configuration and an additional learning curve (if you weren’t already using those tools). If you already have vRA or just want to use it, checkout this post and this other post to learn more.
Thankfully, for those who don’t want to jump into other tools right now but who want to automatically remediate the good news is that, with the more recent editions of vRealize Operations support 13 different actions that can be automated (the list of those actions is here). These actions can be automated without any scripting and without vRealize Automation. With 13 different actions (like power on / off VMs and resizing of just about all VM resources) you can automatically resolve so many different common vSphere performance issues – preventing downtime or application performance issues BEFORE they happen.
VMware’s documentation lists the actions that are currently supported for automation.
So how do you configure automated remediation with vRealize Operations?
I’ll be honest, I learned everything I know about how to configure automated remediation from VMware’s Sunny Dua (@sunny_dua on Twitter) and Simon Eady (@simoneady on Twitter). They’ve been running a series of webinars on vROps that they post on YouTube and on their respective blogs (vXpress and DefineIT)
Thankfully the first webinar that they did in their series was about Building Self-Healing Environments with vROps-
Want to see how to configure automated remediation in video form AND see it in action – watch that video.
There is also a nice overview of automated remediation in VMware’s Fast Facts video, here-
vRealize Operations Automated Remediation – My Experience
I decided to try to configure automated remediation in my home lab (which is still running in VMware Fusion on a maxed-out iMac – for more info see my video)
Of course you’ll need at least one ESXi host, vCenter, and the latest version of vRealize Operations loaded up. From there, you’ll need to configure the vCenter Actions Adapter instance. VMware’s instructions on how to do that are here. Basically it’s as easy as going into the Administration view in vROps, going into Solutions, selecting the Python Adapter, and clicking Configure.
As you can see in the graphics below, you’ll need to connect the actions adapter to any vCenter servers that you want to perform those actions on by entering the hostname, username, and password for your vCenter servers.
As you see in the graphic below, you should see in the end that the actions adapter is in the collecting status.
From there, it’s recommended that you create a new role for the actions adapter in vROps and make the admin a member.
If you go to Content and Actions, you can see the list of actions that are available to be automated.
From there, it’s recommended that you create a new policy and apply it only to the objects that you want to automate actions on. In that policy you would view the alert / symptom definitions and change Local for the Automate options, as you see below.
So ultimately, what automated remediation does it to take alert recommendations (which already exist in your vROps installation) and allow you to turn on an action to take that supersedes the recommendation. What I am trying to say is that, how automated remediation works is for vROps to trigger and alert (just as it always has) and instead of alerting you that, for example, a VM is running low on memory and recommending that you increase it, vROps will actually take that action for you and increase the VM’s memory.
Other ways that the automated actions framework might help are automatically turning off idle VMs or downsizing VMs when capacity is low.
What automated actions kicked in will be displayed on the recent tasks pane – which you can check out after a good night’s sleep and after you get settled in your desk with a cup of coffee the next morning.
Want to see automated actions “in action”? Check out Sunny and Simon’s webinar above!