AIOps – Use Some Intelligence (Part 2)

Welcome everyone to the second and final part of my AIOps introduction. If you haven’t seen the first part, please go back to that one for context. In part 1 we talked about the challenges that exist today, with digital transformation and more and more automation in the software build and delivery areas, there is more and more pressure on IT Operations to react quickly, deal with infra, software, config, connectivity, security, multi-cloud, the list goes on and on.

As usual, if you would prefer to read the content, I’ve placed the transcript underneath. Enjoy!



Today, we’re going to look at some of the tools available, how they leverage AI/ML to solve problems and then move beyond this and finish up to two part video with looking at what future this could enable for Operations teams to become a true enabler for business outcomes.

Let’s get into it!

As mentioned in part 1, AIOps really should be much more than another label to give IT operations tools. It’s a change in how we do things. That’s really important, but the tools are a great start on the journey.

The Tools

Specifically, today, I’m going to be in my comfort zone a little bit, and use VMware tools I work with as some examples of how AI/ML can be used for IT Operations in various different ways.

vRealize Log Insight

VMware’s Log analytics solution and uses ML to intelligently group logs, improving the performance of finding the right log to help fix the problem you’re working on.

It uses ML to analyze massive amounts of big log data from all over the clouds and data centers, whichever logs you forward to it, giving you near real time monitoring and alerts.

It also creates structure from unstructured data. If you know AI/ML, you know that the structure of the data you’re analyzing can be really important. Log Insight will for example bring together information on an application from all perspectives (app logs, network logs, config , messages, performance data, etc.) so that you’re off to a big head start in your troubleshooting, because you have a holistic view of your app.

vRealize Operations

Whereas Log Insight looks at the logs that come out of apps, infra, clouds, etc. vRealize Operations (vROps) starts with looking at already defined metrics. For example, how much memory is being used? or how many Kubernetes clusters are you running and how they are performing?

Unlike logs, it’s about things you can measure.

vROps then applies analytics and machine learning to data collected from these metrics, across infrastructure and applications, looking at specific metrics from things like servers, VMs, apps, everything, in order to automatically spot and react to issues in real time, including things like performance optimization, capacity management and anomaly detection.

vRealize AI Cloud

This builds on top of the vRealize Operations metrics platform and is the first AI/ML solution to optimize infrastructure operations.

Through data collections and reinforcement learning techniques, vRealize AI Cloud continuously optimizes your infrastructure to the KPIs you’ve given (for example performance). It does this whilst factoring in the completely dynamic nature of traditional and modern applications. vRealize AI gives you a self tuning datacenter.

Combine the power of Log Insight, vROps and AI Cloud, you then have a Self Tuning, Self Healing, Self-Driving Datacenter or Cloud.

Tanzu Observability (was Wavefront)

Where as vRealize Operations is all about looking at metrics for specific things, like compute power, number of database queries, etc. (things we know about). Tanzu Observability also does some of this, but is more focused on allowing you to build your own metrics and using different queries and AI/ML to analyze them.

A developer can place a hook (a piece of code) into their application, which can send a metric out. For example, a games developer might want to measure how many times someone uses a particular weapon in their game, that metric can be built into the code, so that it reports when someone uses that weapon. We can then apply anomaly detection to tell us when the typical use of that weapon changes……maybe we can see from this data that people have stopped using this weapon so much, and it might be time to design a new one? Useful, business level information.

Hopefully you can see that AI Ops suddenly starts to become an enabler for innovation. But you need to know what you’re looking for. Which leads me into the future.

The Future

I think the “AI” in AIOps really should mean Data Scientist. Data Scientists are the ones building these algorithms and methods to understand and navigate data in an intelligent and useful way.

As DevOps blurs the lines between Developers and Operations, AI Ops will start to blur the lines between Data Scientists and IT Operations. The data will all be there, and finally, it will be easily and quickly accessible through the types of tools I’ve talked about.

The automation is already happening, things are starting to be automatically self-tuned, self-healed.

In the future, we might start to see AI Ops teams building complex algorithms to improve how the platform runs (maybe similar to what’s available in vRealize AI Cloud for infrastructure).

If we go back to observability, we now know that we can build monitoring hooks into apps and eventually have a whole mess of data from all these modern apps. That data can then be queried very easily using the tools available. The limit will be your imagination.

Amazon will be looking at how long you hover over the buy button on their website for the new thing you really want (but probably don’t need!). They might use that information to decide how much more advertising they should place on you, to push you over the edge and buy that new thing.

We may soon see AIOps in more traditional enterprises start building models for supply chain forecasting too. Eg. Customers seem to like this screen of the app more than that screen of app, what can we leverage this information for, to sell more of our product, maybe more ads on that page? (Let’s hope not).

Summing Up

The original question I asked when i released part 1, was

“How can we start to make use of AI/ML if we’re not a developer or data scientist?”

As we’ve learned, the more we can embrace complexity and leverage AI/ML to find problems or opportunities in IT operations, the further we can use IT Operations as an enabler to move faster and be more agile as companies, but also to be more innovative, coming up with new questions for the AI, that could be transformational for the company.

You’re not going to have to be a data scientist to do this, but just as a decade ago, when all the traditional networking engineers started learning python programming because they knew DevOps was coming…’s probably time to brush up on your maths because, the more you understand the data and how to manipulate it with equations used in algorithms, the more effective you will be in the months/years to come with AIOps

AI Ops is here, so lets leverage this AI/ML technology, move away from reactive IT operations, long conference calls pointing fingers at each other and lets move things up the stack, up to solving business problems, driving new revenue streams, where the value is.

And that, much more than just a single AIOps tool, is what AI Ops as a concept can really give us.

Thank you very much.