This post is authored by Jon Herlocker, CTO, Cloud Operations Management Products, Cloud Management Business Unit, VMware.
At VMware, we are personally dedicated to making IT Operations a better place. What does that mean? To me that means providing software and automation that detects and mitigates most issues before they impact your customers, rapidly detects and resolves issues that do impact your customers, reduces the total cost of purchasing and maintaining your infrastructure, frees up time for innovation, and generally makes IT operators look good and lets them sleep better at night.
My experience working previously for a large SaaS application company with millions of users and petabytes of storage led me to appreciate first hand the power of a log analytics system in improving the lot for IT operators and through them dramatically improving our customer experience. When you have hundreds or even millions of customers, it is traditionally very hard to detect when individual users are experiencing issues due to software bugs, configuration issues, and capacity or hardware problems. But each software and hardware component in your IT ecosystem has its own “Twitter-like” feed, generating a continual list of comments on what is happening, what small things went wrong. What we found was that by analyzing these logs, we had the ability to detect, diagnose, and remediate issues that we didn’t even know existed before because they were lost in the noise of the large ecosystem. As we went through the program of mining our logs and remediating the issues we found there, we saw significant increases in all of our customer satisfaction metrics.
Bringing that experience to VMware, we created Log Insight – a real time log management system that leverages predictive analytics to help IT operators detect, diagnose, and remediate issues through the analysis of those Twitter-like feeds generated by each component in the IT ecosystem. Log Insight has now been on the market for just over a year, and the feedback from the customers has been almost universally positive. Customers love how easy it is to install, manage, and use. They love how fast and expressive the queries are, and they love the power of machine learning being applied to automatically detect structure in logs. But most of all they love that they can solve real world problems quickly with it (recent example).
We are working continuously towards the perfect combination of capabilities to make IT operations management a better place. Towards this end, Log Insight has been a great partner product to vCenter Operations Management Suite. vCenter Operations collects performance metrics from IT environments and provides powerful performance and capacity analytics – which provide a rich set of tools to detect and diagnose issues, optimize performance, and to plan and optimize capacity.
Figure 1 : Health Summary for ESXi with Launch in Context to Log Insight
Let me illustrate the value of having both Log Insight and vCenter Operations within your IT Operations toolkit. Within VMware, we have this internal application with components that span geographies – the client systems reside in Seattle, the server systems reside in a remote datacenter in eastern Washington. We monitor this application with both vCenter Operations as well as Log Insight. The application is internal and is revised frequently. When we receive a report that the application is performing slowly or seems to be hung, the combination of vCenter Operations and Log Insight allows complete coverage of all the possible causes and a user experience to rapidly close in on the root case. First – is it a resource contention issue? Is the WAN network bottlenecked? CPU limited? Not a problem – vCenter Operations tracks all the relevant resource utilization and has sophisticated calculations of true practical capacity of resources – a quick glance at vCenter Operations, and you can identify or rule out that you have a resource contention issue.
You can also use vCenter Operations’ analytics to see if any of the resource utilization metrics are abnormal for this particular time of day. If not a resource issue, then it’s a problem with the application itself – with “launch in context” integration between vCenter Operations and Log Insight, in two clicks you can quickly pull up the logs for an object that you were viewing in vCenter Operations. From here you can see from the application logs if the application is reporting a higher number of errors than usual. The predictive analytics of Log Insight automatically detect event types, so you can quickly see if there are any new errors occurring that didn’t use to occur or any new configuration actions that happened around the time of the performance degradation.
Figure 2: Log Insight showing logs for object selected in vCenter Operations Manager
As you can see, vCenter Operations and Log Insight together can be a very effective combination. That’s why I’m really excited about the launch of the new VMware product bundle vRealize Operations Insight 5.8, which is now available to all current and future vSphere with Operations Management (vSOM) customers, as a separate, add-on solution. vRealize Operations Insight 5.8 (vROI) brings all the features of vCenter Operations Management Suite Advanced and vCenter Log Insight together into an integrated solution for performance management, capacity optimization, and real-time log analytics. Through vROI, you get predictive analytics leveraging both structured and unstructured data, ensuring better performance and uptime. If you have purchased or are thinking about buying vSphere with Operations Management, buy vRealize Operations Insight as well – and you’ll be resolving more issues faster.
Next steps for you:
- Learn more about vRealize Operations Insight
- Test drive vRealize Operations Insight with the VMware Hands-on Labs
(FYI, if you haven’t tried a Hands-on Lab, I highly recommend it. First of all, it is completely free, and you will interact via remote-desktop with REAL installations of Log Insight and vCenter Operations that are dedicated to you for the duration of your lab)