Welcome back again to this multi-part blog series on vRealize Operations 7.5, where I will provide a technical overview of all the new and improved features in this release. The first part covered improvements to Workload Optimization, with new vSAN capability and better visibility into Business Intent policy violations and host group workload. In part two, I showed off how Capacity Optimization got Allocation model back, new and improved What-If scenarios, upgrades to costing and more.
In this part, we will look at what’s new in troubleshooting and there’s a lot to talk about so here we go!
New, Fully Native Application Monitoring
If you use or have ever used Endpoint Operations, you know the visibility into the health and availability of the applications running in your SDDC is invaluable. So, how can we add more value to invaluable?
By bringing you full lifecycle management of agents, including deployment, upgrade, configuration and even uninstall. Never with vRealize Operations has it been easier to get started with application monitoring than with 7.5.
In 7.5, Application Monitoring will provide support for Windows and Linux virtual machines, and 17 popular packaged applications. Additionally, remote monitoring is available for HTTP, ICMP, TCP and UDP.
How does it work? Basically, we are now using the Telegraf agent to collect metrics for vRealize Operations. Why Telegraf? Well, if you don’t know already, Wavefront by VMware also uses Telegraf agent and it not only makes sense to align but it also gives vRealize Operations the ability to manage Telegraf for either native vRealize Operations application monitoring, or if you prefer, Wavefront for application monitoring. It’s all about choice, simplicity and shared technology!
To manage agents, you simply deploy the Application Remote Collector (or ARC), a virtual appliance that can support up to 6500 virtual machines (in the “Large” deployment configuration). After deployment, simply register the ARC with vRealize Operations and select the vCenter servers hosting virtual machines with applications you would like to monitor. At that point, you are ready to start deploying Telegraf. Once Telegraf agent is deployed a service discovery runs to find any of the supported applications on the virtual machine.
You can then select the applications you wish to monitor, provide some configuration details and that’s it!
So, you may be asking, “What if I’m using Endpoint Operations today?” Great question but no need to fear. You can try out the new Application Monitoring capability without disruption to your current monitoring with Endpoint Operations – they can both be used together in the same 7.5 cluster (just not on the same VM, please). This way, you can ease into the new without ripping out the old.
Next, I’m going to talk about to key, new capabilities for Troubleshooting that pair very nicely with Application Monitoring.
Advanced Relationships Widget
If I had to select one new feature in 7.5 as my personal favorite, this would be it. The new Advanced Object Relationships widget brings powerful, responsive and easily customized relationship views to various out of the box content as well as your own custom dashboards. Using this new widget is such a pleasure and despite the ease of use it has powerful filtering, search and view options that respond quickly to make troubleshooting (almost) fun!
For example, in the screen capture above, I’m exploring the relationships between a virtual machine running a web application and the infrastructure supporting that virtual machine. This makes it easy to quickly identify how problems or changes in the infrastructure are impacting an application. Or, in the case of proving your innocence gives administrators a great tool to show application owners where their applications are running in the SDDC.
Metric Correlation
Suppose you are troubleshooting high CPU usage on a virtual machine. Wouldn’t it be nice to know if there are other metrics on the virtual machine exhibiting a similar pattern of behavior? Or, let’s suppose CPU Ready % is high on one virtual machine, but you would like to know which other virtual machine peers have the same CPU Ready % metric patterns.
Not only is this available in 7.5 it is incredibly easy to use to help you get to root cause and isolate the problem area during troubleshooting very quickly.
For example, suppose I get a call from an application owner asking if I saw any “unusual” CPU usage a couple of days ago in the environment. That’s a broad question, but I can try to narrow things down using vRealize Operations.
First, I’ll search for the application and find out which virtual machines are involved.
There are two virtual machines, both are showing alerts currently, but let’s not focus on that for now. I will double click on the web server virtual machine to update the relationship view and change my focus to the metrics for that object.
Interestingly, by viewing the CPU|Usage % metric for the web virtual machine, I can see a brief period of increased CPU usage. But what may have cause that? For this, I’ll open the metric chart menu and select Correlation > All self-metrics.
This will give me a list of any metrics showing a direct correlation to the CPU|Usage % metric during the time period (default is 7 days, but you can easily change that from the metric chart menu).
Nice, I can see that Datastore|Outstanding IO requests also had a brief spike at the same time. Looking further, I find that the Network|Data Receive Rate (KBps) also spiked. I can pin both metrics in the Correlation view and those metric charts will be displayed in the All Metrics page so I can do some further troubleshooting.
Now, were there other virtual machines with CPU|Usage % spikes during this same period? I can find out by selecting the Correlation menu option again and this time using Selected metric of all peers. This provides proof that the only other virtual machine on the same host that experienced a CPU spike was the application’s database server.
So, we can confidently go back to the application owner and tell them, not only did we find “unusual” activity but exactly when and exactly which objects were involved and likely areas to investigate (i.e. network metrics indicate an increase in traffic to the web server at that time – maybe check your activity logs).
Super Metrics Are Super-Duper!
I have never met a vRealize Operations user who doesn’t think Super Metrics are powerful and incredibly useful. On the other hand, questions about “how to create” Super Metrics are among the top requests that I see from our field. So, we made it a whole lot easier to create them with a new Super Metrics Editor.
Not sure which object you need to reference in your formula? Which metrics are available for CPU, memory, networking or storage? What about available functions?
No problem! With autocomplete in the new editor, you simply start typing an object name or object type and the Super Metric Editor will pop up a list of options that match. Keep typing to refine the list and then use Enter to select (you can also activate search using CTRL+Space).
Selecting the object type from the list of suggestions is easy, then use the same method to finish the formula.
Now, once the formula is built and tested, you can assign it to objects and policies directly from the editor. That’s right – it just got much easier to build AND use Super Metrics!
On-the-Fly Super Metrics!
Let’s say you need your new Super Metric for troubleshooting a problem, but you don’t need to collect the Super Metric ongoing. Now, with 7.5 you can view Super Metrics that have been created, even if they aren’t assigned to a policy, using the Super Metric Preview in All Metrics.
Above you see I’ve selected the Super Metric I created in the last section. What you may also notice is that the Super Metric data spans the entire range of the default 7-day chart view! That’s right, you can use this to apply your Super Metric to historical data, even if the Super Metric didn’t exist and was not being calculated.
Dynamic Custom Properties
If you use Custom Groups to create logical containers, you are going to appreciate this next new feature. What if you could create a custom property and assign a value to it automatically when an object joins a dynamic Custom Group?
For example, you have a Custom Group that dynamically adds virtual machines which have less than 2 vCPUs and less than 2GB of RAM allocated.
You can have the members of this group automatically get a property added “VM Size” and updated with the value “hotdog” (the value could also be a number) as you can see below. If a virtual machine is reconfigured with 2 vCPUs later, it will be removed from the Custom Group and the value set to “not hotdog.”
You can use this new Custom Property in dashboards, alerts, views and Super Metrics!
Service Now Integration
Another customer ask has been the ability to forward alerts from vRealize Operations to Service Now to open incidents. In 7.5 this is a native capability that is presented as a new alert notification plugin. This makes it easy to integrate vRealize Operations into your IT operations!
The alert rule using the Service Now Notification plugin includes support for 19 customizable fields such as category, business service, impact, assignment group, assigned to and severity. Incidents can be assigned to groups or individuals without custom assignment policies.
Better, Faster, Easier Troubleshooting
I think you will agree that with all of the above (and even more that I didn’t have the space to mention) your ability to get to root cause and solve issues in your SDDC just got a boost. If you want to see how these improvements can help you, download a trial of vRealize Operations 7.5 and try it out! You can also find more demos and videos on vrealize.vmware.com. Be sure to check out the next blog in this series where I will cover what’s new in 7.5 in Compliance, which brings ability to create custom compliance templates and automatic remediation of compliance alerts.