Cloud Management Platform

The Intelligent Operations Journey

IT is undergoing a transformation – or at least it should be. Whether it be transforming to a digital business, increasing business agility, improving time to market, or focusing more on business outcomes that drive overall business value, IT is the enabler and can be the driver. But, IT is being asked to “step up” and, more often than not, without an increase in budget or headcount, and without relief from existing day job pressures. I wish I could proclaim that, if you just follow these 10 steps, I guarantee you will overcome all your challenges, you will be able to spend all your time on fun and innovative projects going forward, and you will start becoming the guest of honor at swanky C-level events, but I can’t. If I could, I would be writing this blog from my private island in some remote and exotic location, but I’m not. But all is not lost. VMware is helping IT organizations move in the right direction, address their challenges, and deliver incredible value to the business, so at least you might start getting invited to nicer lunches where the business is truly asking for your insights and guidance on how IT can partner with them to successfully become a digital business.

One way you can endear yourself to the business is by keeping their applications running. Well, you might say “we are keeping their applications running,” but at what cost in terms of amount of time spent, lack of sleep, or increased stress. VMware’s approach to Intelligent Operations with VMware vRealize Operations, VMware vRealize Network Insight, and VMware vRealize Log Insight can help, a lot. But, like so many things in IT, it’s not just about the tools and technology; it’s also about people and process, in this case role and skill set evolution as well as having an open mind about how your work. And, like most things in IT it’s not instantaneous but a journey, a journey you can join regardless of where you are and one that progressively makes your work life, and career prospects, better.

 

Crawling Reactively

As much as I regret having to write it, most IT organizations are stuck in a reactive mode of operations. Whether due to the capabilities of the tools they’re using, the perception there is a lack of time or resources to do anything about it, a desire to remain in one’s comfort zone, or simply the mindset that “it’s the way we’ve always done things.” This is unfortunate, especially as virtualization and the software-defined data center are implemented to enable agility and provides IT the opportunity for operational efficiency, but it’s a fact of life.

Even in reactive mode where IT’s modus operandi is reacting to incidents, vRealize Operations, vRealize Network Insight, and vRealize Log Insight capabilities help IT work more efficiently. A rich set of out-of-the-box dashboards and metrics; guided analytics-based root cause analysis with recommended remediation; integrated vRealize Log Insight providing context sensitive logs handling of unstructured date like the example in Figure 1 below; vRealize Network Insight’s mapping of data paths across virtual overlay and physical underlay coupled with contextual search capabilities; are all incredible time savers when troubleshooting and resolving incidents.

 

Intelligent Operations Journey
Figure 1. Example of vROps Troubleshooting Dashboard for Root Cause Analisis

 

There really are no new skill sets required other than learning and trusting the tools. Of course, you can continuously improve and create even more efficiencies by creating custom dashboards or updating guided root cause analysis text, but there are no changes to processes, each functional team can have their own focused dashboards, and everyone can remain within their comfort zones. It that’s what they want to do. But in the end, you’ve made incremental improvement; it’s business as usual. Sure, you can restore service faster, but you’re still reacting.

 

Walking Proactively

I hope, that’s not good enough for you. You really do want to improve your situation, and you really do believe in making your customers’ lives better. You want to take the next step which is becoming proactive so you can increase your ability to better ensure performance and availability across the software defined data center you were so excited to help implement.

Once again, vRealize Operations, vRealize Network Insight, and vRealize Log Insight helps you on your quest. Of course, you still take full advantage of the accelerated troubleshooting and resolution capabilities that greatly improve the reactive mode of operations. But there’s more: fully customizable dashboards; super metrics; prioritize alerting with actionable recommendations; policy-driven automation; policy-based, intelligent workload placement and best-fit re-balancing; REST-based notification plug-in; micro-segmented traffic flow mapping as shown in Figure 2 below; all capabilities that, if utilized, enable proactivity.

 

Figure 2. Example of vRealize Network Insight Micro-Segmented Traffic Flow Mapping

 

 

Notice I wrote “enable proactivity.” Proactivity requires a mindset change. It’s about moving from focusing on troubleshooting based on an alert or incident ticket to preventing the alert or incident ticket from happening in the first place. And about focusing on the system-thinking approach rather than specific issues as they arise. It’s about acting based on critical thinking about lessons learned to determine how you could have detected the symptoms and prevented the alert from happening. It does require you roll up your sleeves and enhance your skill set by digging more deeply into vRealize Operations, vRealize Network Insight, and vRealize Log Insight to see how you can apply their capabilities proactively to your environment. You may want to learn about the REST API and how you can leverage it to be notified when your new super metrics triggers an alert that something could be about to occur to a set of virtual machines hosting components of a critical application based on a performance threshold you established. I highly recommend exploring Operationalize Your World, the brain child of Iwan Rahabok, Sunny Dua, and Kenon Owens as a practical way to get started.

You may also want to take more of an interest in those boring Change Advisory Board meetings and get more actively involved by creating custom dashboards to proactively monitor a change that’s going in and use that knowledge to update existing dashboards, super metrics, and thresholds to account for the resulting production change – proactively. Finally, you may want to reach out to your colleagues in the other functional teams to help them understand how the software defined data center affects them and how they might supplement their monitoring with vRealize Operations, vRealize Network Insight, and vRealize Log Insight as a result. Doing any or all of these activities sets you up for the next step in the journey.

 

Running Predictively

The ultimate goal in the journey, for now at least, is predictive operations. Predictive operations is about identifying a potential service disruption or performance drop well before it happens so you have time to fix the underlying cause before the customer is impacted. vRealize Network Insight provides predictive capabilities such as showing increasing “noise” conditions that could become a problem. The advanced vRealize Operations capabilities such as dynamic thresholding and policy-based, automated re-balancing help here. Dynamic thresholding provides early warnings based on trending outside of normal boundaries for performance that vRealize Operations actively learns. And, can be associated with policy-based, automated remediation actions. Using similar self-learning techniques, policy-based, automated re-balancing is based on predicted usage trends. A good example of proactively identifying workload re-balancing, across clusters no less, is show in Figure 3 below.

 

Workload balancing
Figure 3. vRealize Operations screen showing cross-cluster workload re-balancing opportunity

 

To truly take advantage of these capabilities you need to enhance your skillsets as your role starts evolving into areas associated with an “analyst.”  You need to acquire an even deeper knowledge of vRealize Operations to understand how to apply its dynamic thresholding, custom policy, and trending capabilities. You also might consider dusting off or picking up some Python skills as well as learning VMware vRealize Orchestrator to develop and customize automated actions. Not to mention integrating vRealize Operations with other time-based metric sources or writing interfaces to your ITSM tool to automatically create, fill-in, and close Incident Tickets for auditing and Problem Tickets for root cause analysis – after all you fixed an “incident” before it occurred. But it will be worth the effort from a career perspective as this is where Intelligent Operations will, predictably, continue to evolve.

=======

Kevin Lees is the field Chief Technologist for IT Operations Transformation at VMware, focused on how customers optimize the way they operate VMware-supported environments and solutions. Kevin serves as an advisor to global customer senior executives for their IT operations transformation initiatives and leads the IT Transformation activities in VMware’s Global Field Office of the CTO.