(This is a repost of original from the VMware CloudOps blog)
by Paul Chapman, VMware Vice President Global Infrastructure and Cloud Operations
Most IT departments take a similar sort of “swarming” approach to service incidents and problems when they occur.
For most of my career, IT has been a reactive business: we waited until there was a problem and then scrambled very well to solve it. We were tactical in terms of problem solving in a reactive mode, yet monitoring was focused on availability and capturing degradation in services, versus being proactive and predictive, analyzing patterns to stay ahead of problems. In the new world of IT as a service, where expectations are very different, that model no longer works.
New and emerging forensics tools and capabilities give IT the tools to be proactive and predictive—to focus on quality of service and end-user satisfaction, which is a must in the cloud era.
Forensics: A new role for IT
As an example, with new network forensics tools to monitor and analyze network traffic, it may seem a natural fit for network engineers to use them, but at VMware we found the skillsets to be quite different. We need people who have an inquisitive mindset — a sort of “network detective” who thinks like a data analyst and can look at different patterns and diagnostics to find problems before they’re reported or exposed into user impact.
Those in newly created IT forensic roles may have a different set of skills than a typical IT technologist. They may not even be technology subject matter experts, but they may be more like data scientists, who can find patterns and string together clues to find the root of potential problems.
Adding this new type of role in the IT organization most definitely presents challenges as it goes against the way IT has typically been done. But this shift to a new way of delivering service, moving from the traditional swarm model to a more predictive and forensics-driven model, means a new way of thinking about problem solving. Most importantly, forensics has the potential to create a significant reduction in service impact and maintain high level of service availability and quality.
Quality of service and reducing end user friction
Every time an end user has to stop and depend on another human to fix an IT problem, it’s a friction point. Consumers have come to expect always on, 100 percent uptime, and they don’t want to take the time open a ticket or pause and create a dependency on another human to solve their need. As IT organizations, we need to focus more on the user experience and quality of service—today’s norm of being available 100 percent of the time is table stakes.
With everything connected to the “cloud,” it’s even more important for IT to be proactive and predictive about potential service issues. Applications pull from different systems and processes across the enterprise and across clouds. Without the right analysis tools, IT can’t understand the global user experience and where potential friction points may be occurring. In most experiences, IT finds out about a poor quality of service experience when users complain — perhaps even publicly on their social networks. Unless we get in front of the possible issues and take an outside-in, customer-oriented view, we’re headed for lots of complaints around quality of service.
At VMware, we have seen a significant reduction in overall service impact since using network forensics, and we’re keeping our internal customers productive. Focusing on quality of service and finding people with the right skillsets to fill the associated roles has us unearthing problems long before our end users experience so much as a glitch.
Follow @PaulChapmanVM on Twitter.