Customers trust vRealize Operations for Self-Driving Operations powered by AI/ML to provide the best performance, optimize for efficiency and troubleshoot quickly. However, vRealize Operations doesn’t stop there, and if you aren’t already using the application aware monitoring and troubleshooting capabilities of this solution, you will probably change your mind after you read this blog.
In this release, vRealize Operations brings native capability for discovering services running on virtual machines in your Software Defined Datacenter (SDDC) and automatically creating application containers based on dependencies between those services. Additionally, exciting new capabilities have been added to the virtual machine Action menu that allow you to quickly understand top processes and run scripts in guest. Finally, Application Monitoring using the Telegraf agent gets some updates to bring it to parity with Endpoint Operations, which will be removed from vRealize Operations in the next release. Let’s dig into these for more details.
Native Service Discovery
Previously, the Service Discovery Management Pack (SDMP) could be installed into vRealize Operations for discovery of services running on monitored virtual machines. The SDMP also provides dependency mapping between services and the ability to automatically create application objects that describe the tiers (e.g. web, app, database). Finally, Site Recovery Manager Protection Groups and Recovery Plans could be validated against discovered applications to make sure all services were protected.
Well, this is still the case, but now we have made it easier to use Service Discovery as it is now built into vRealize Operations and you only need to enable it to begin discovering services in your SDDC.
How much easier? Just toggle a switch in the vCenter Cloud Account and provide credentials for your Windows and Linux guest OSes.
A few things to note here. First, you probably notice the yellow banner referring to specific VMtools versions required. You should read KB75122 for more information, but in brief this feature requires VMtools version 11.0.01 or higher for Windows and 10.3.21 or higher for Linux (or open-vm-tools 11.0.1 for Linux).
Allow me to explain some of the fields and options on the setup. First, the “Use alternate credentials” option is present if you would prefer to use a different vCenter user account for Service Discovery. Why would you? Well, keep in mind that the vCenter user account will be the account to perform the Service Discovery via “Execute Program in Guest” using the vSphere APIs. Probably not a bad idea to use an account with permissions for those APIs only if you have strict requirements for account permission and usage in your environment.
For the guest OS accounts, a common account is assumed in the Cloud Account configuration. In other words, you would have one Windows account and one Linux account for all monitored VMs. If you do not, I will cover how to set up accounts for each VM in a moment.
For Site Recover Manager, vRealize Operations will need an account with permissions to read SRM settings in vCenter.
You can ignore the password for Guest User Mapping CSV, as that is a holdover from earlier versions of the solution and is not required for the native implementation of Service Discovery in this version (8.0).
By enabling the last option, Business Application Discovery and Grouping, the Service Discovery will create application objects based on discovered services and incoming/outgoing dependencies. For example, if a Microsoft SQL service is discovered and vRealize Operations observes connections from other virtual machines on the ports used by SQL then it assumes an incoming dependency. Then, the connected virtual machines are analyzed and let’s say that Microsoft IIS is running on them. Service Discovery will assume a two-tier application with a web tier and a database tier and will create an application object containing the discovered services assigned to the appropriate tier.
You can then edit this application and add any other components and tiers you wish, or just rename them to something more meaningful.
I mentioned that we understand that not all customers will have a common user across all virtual machines. For that, we offer the capability to set up unique credentials for one or more virtual machines.
From the Administration > Inventory select the Manage Services tab. This is where you can check the status of Service Discovery and perform other administrative tasks. Let’s cover the credentials first.
Select one or more virtual machines from the list (you can filter the list if you wish). Click the icon to Provide Password and simply add them. On the next discovery run (which happens once per hour) the new credentials will be used for Service Discovery.
The other options here are to start or stop monitoring of services. By default, vRealize Operations will not monitor the discovered services. But if you wish, you can enable monitoring as show below.
Once enabled, metric collection for the monitored service takes place every 5 minutes. Note that the Service Discovery is not going to alert on the status of the service and if the service stops, it will be removed from inventory. So, if you want to monitor the status of the service and alert on availability, it is better to use Application Monitoring in vRealize Operations. Below you can see the metrics collected including performance, connection information, and some basic properties of the service like version and path.
Powerful Troubleshooting VM Actions
Enabling Service Discovery also adds two new items to the virtual machine object Actions menu. They are Execute Script and Get Top Processes, and both can be extremely helpful when troubleshooting issues.
Let’s start with Get Top Processes, because I believe this is one that most customers will use on a regular basis. When you run this action, you can select the number of processes you want to see. The default of 10 is typically good enough for most troubleshooting.
Here’s the output from a Linux guest OS:
And a Windows OS:
For both OS outputs, you can sort by CPU or memory usage by clicking the radio button in the respective columns.
This gives you quick verification of what is going on in the guest OS, particularly if you have an alert for high CPU or memory usage. You can see if there is a hung process that needs to be restarted, or a service that’s running that should not be.
Which brings me to the next action, Execute Script. With this action, I can input any script that can run on the OS if I were to invoke it from the command line directly. For example, assume I do find that a service is hung and needs to be restarted.
Notice you can set the timeout in case the script is a particularly long-running one (like a file search, for example). In addition, you get the exit code, stdout and stderr.
For an example of how this can help with troubleshooting (and even remediation) of an issue, watch this demonstration.
Application Monitoring Updates
Finally, the Application Monitoring capability, based on Telegraf agents, got a few new updates. Application Monitoring was introduced in 7.5 and we are getting closer to parity with Endpoint Operations capabilities in 8.0. Currently, we do not intend to continue Endpoint Operations after the 8.0 release, so if you are still using that, consider switching to Application Monitoring.
Now, what’s new with Application Monitoring? There are three new application services now available, bringing the total to 20 supported packaged applications.
As you can see NTPD, Java and Websphere cards have been added. Specific details can be found in the product documentation:
Personally, I’m excited about the addition of Custom Script Monitor in 8.0. This allows you to “roll your own metric” using a script that Application Monitoring will execute on every collection cycle. For example, consider this PowerShell script which counts the number of files in a folder in Windows:
You save the file on a local folder of the virtual machine and then configure Custom Script Monitor with the path, prefix and arguments. The prefix is helpful if you’re not running a batch or shell script and need to invoke an interpreter.
The timeout value of 5 minutes matches the default collection interval, so it’s best to just leave it at default. Once you save, the script will be executed, and the single numerical value returned as a metric.
For a demonstration showing how this works, watch this video.
Get App Aware – Get vRealize Ops!
I think you will agree that these new features and capabilities give you better visibility and enable faster time to resolve issues in your environment. If you’re already running vRealize Operations, upgrade to 8.0 for these features and more. If you aren’t a vRealize Operations customer, download a free trial today!