Secure Healthy Systems with vROps Alerts
By: Cameron Jones, Blue Medora
In any enterprise-monitoring tool, alerting is going to be a primary focus. vRealize Operations Manager (vROps) is no exception. Combining external management packs with vROps alerting ensures the health of not only your VMware system but also your entire infrastructure. To gain insight into how alerts work, let’s take a look at an environment that is running the vROps Management Pack for Cisco UCS.
Figure 1 – Environment Overview of VMware running on UCS
In Figure 1, we can see a small VMware environment hosted on Cisco UCS. Most of the environment is healthy, but a few of the underlying UCS components are unhealthy. First of all, the UCS Chassis is in a Health-Immediate state. Let’s examine that further.
Figure 2 – Step-by-step recommendations on how to fix an unhealthy chassis
By going to the Alerts tab, we can select the Alert to read more about it. In Figure 2, we can see that the Chassis is having a power problem. The fault details explain, “This fault typically occurs when the chassis fails to meet the minimal power requirements…” vROps also provides us with critical information on how to solve this alert. In this case, we receive recommendations directly from Cisco’s own documentation. Step 1 is housekeeping, so let’s skip to Step 2. Here we can see that we need to “verify all the PSUs for the chassis are functional.” As you might recall from Figure 1, three of our power supplies also had issues. From within vROps we can identify that our Chassis is one PSU failure away from crashing, which would in turn take our entire VMware environment down. Next, we navigate to the power supplies and see what is going on.
Figure 3 – Step-by-step recommendations on how to fix an unhealthy power supply
In Figure 3, we can see that each of these power supplies is offline. We are provided with step-by-step recommendations that lead us to checking our hardware to solve the alert. From here, we will certify that our backup power supplies are working properly before our entire system goes down.
Figure 4 – vROps action allowing the admin to give more memory to a stressed VM
Finally, vROps can solve these issues found by Alerts with Actions. For another example, in Figure 4 we see a virtual machine that has much higher memory utilization than expected. In fact, this high memory utilization is causing stress (performance degradation) on the virtual machine. From this alert, we can now click ‘Set Memory for VM.’ This enables us to use the provided action and ensure our VM is healthy again. We can now identify an alert and solve them all from within vROps.
The VMware framework around alerting is one of the best features in vROps. A single tool enables the admin to gain visibility of unhealthy objects in their environment, see what is causing the problem and solve the issues with step-by-step recommendations or built-in Actions. This simple workflow allows the admin to cut problem-solving time and prevent system-wide outages.
For more information on virtualization and cloud infrastructure, visit the VMware Solution Exchange and check out the vRealize Operations Management platform. For more information on the management pack used in this solution, visit the Blue Medora vRealize Operations Management Pack for Cisco UCS product page.