Authors:
Cameron Haight, VP and CTO, Americas
Sidd Mallannagari, Director Strategic Initiatives Cloud Management BU
IT environments continue to become more complex, resulting to Infrastructure and Operations teams seeking solutions to help simplify their day-to-day management efforts. While not always guaranteed to be a panacea, the hope is that solutions incorporating AI and machine learning algorithms will provide these teams new capabilities to address the challenges.
As a company that has provided data center technology to improve IT operations for over twenty years, VMware understands that IT organizations need to balance both costs and flexibility that the business needs for its digital transformation efforts. We also realize that the cognitive demands on IT operations teams continue to grow as a result of the unavoidable complexity that accompanies many transformation efforts.
Consequentially, VMware is committed to delivering upon a vision of providing an increasingly self-managing infrastructure – a self-driving datacenter. Keenly aware of past efforts that had similar goals, VMware intends to deliver this highly automated functionality using a phased approach. Ultimately, however we seek to deliver IT management capabilities that will make the datacenter self-deploying, self-securing, self-optimizing, self-healing and self-escalating.
To assist us with our vision and guide our efforts, we conducted an online survey of 122 Enterprise customers (those having 5000 employees or more) who comprise part of our Inner Circle Customer Advocacy program. The rest of this blog will distill what we think are some of the key conclusions.
The first set of data points that we sought to understand was to assess the state of adoption of AI and ML capabilities to determine enterprises level of comfort in making AI essentially a member of the “team.” 12% of Enterprise customers indicated that they are currently using these technologies while 28% said that they anticipated adoption in the next 3 years. Customers are clearly in the early phase of adoption.
However, 49% percent stated that they have no plans yet to adopt such tools. This number seemed surprising to us as there are many tools available today (including some of VMware’s current products such as vRealize Operations) that incorporate machine learning capabilities that may in fact not be obvious to the end user. In addition, there are a wide array of suggested meanings for the terms AI and ML causing likely confusion about what capabilities are implied. The responses regarding adoption concerns later in the survey also throw additional light on explaining this data point. Finally, there are levels of self* functionality – much like that which exists in terms of levels of autonomy for self-driving cars so we might presume that the plans for AI/ML-directed automation could vary based upon the level of sophistication – a hypothesis which we test later.
As IT operations and other support teams perform a wide variety of functions, we wanted to identify where the application of AI and ML technology would be deemed to be most useful. Respondents were given the choice of selecting multiple options. The greatest interest was in tackling the fundamental tasks of troubleshooting as well as the related jobs of performance and capacity management (which differ primarily in regards to the context of time). Security management, not surprisingly, was also ranked very high but we believe the increased urgency for added support in this area is largely due to the exponential growth in cyber threats. The top four choices selected by more than 50% of the respondents represent areas of increasing complexity and cognitive load requiring high levels of skill representing latent demand for solutions from IT operations practitioners. While a result where every functional area was ranked high might not provide much insight to us, we were still surprised that AI/ML-based compliance and cost management ranked so low given the increasing focus on regulatory requirements as well as the always pressing need to drive down costs.
While technological innovation is designed to solve a problem, it can also bring new challenges and we sought to uncover the concerns with leveraging AI and ML for infrastructure operations. Lack of technology maturity ranked highest and as suggested earlier, there may be a lack of clarity of where machine learning may have already been used within the organization. Certainly, somewhat related statistical modeling capabilities have been used within IT for decades so this concern is likely related to the newer types of machine learning such as reinforcement learning which are relatively new techniques in the industry.
It was very interesting to see added complexity listed so highly (29%). There is a lot of research in areas such as aviation, plant operations and even today’s emerging autonomous car market where the implementation of automation and AI results in complexity emerging elsewhere (in the form of, for example, humans now having to understand what the machine is doing on our behalf). Trustability was also a concern which research suggests may be due to the underlying algorithms being increasingly perceived as a “black box” and hence indecipherable due to their growing sophistication. Both of these data points suggest that the clarity of the human-to-machine interface will loom large.
We presented a model somewhat similar to that used to describe levels of autonomy for self-driving vehicles to assess the desired level of self-managed infrastructure sophistication. Partial automation, which we described essentially as automation responding to known events was the preferred level for enterprises with thirty-nine percent selecting. Conditional automation in our categorization had significantly more advanced capabilities (such as real-time remediation and predictive analytics) was next at thirty-one percent. Not surprisingly both lesser and even more advanced capabilities were not opted for and we base this largely on the areas of concern that we explored in the previous question.
Understanding the primary drivers of AI/ML-based data center automation was another critical element to understand as we seek to deliver VMware’s vision. Three of the four most critical desires didn’t really seem to emphasize much in the way of necessary system-based sophistication. Classic automation today can improve agility and enable the transfer of funding from KTLO (keep the lights on) to innovation projects such as digital transformation and reduce overall spend.
Perhaps most interesting is the need to address complexity while at the same time limiting the added complexity of the solution as we saw earlier. As we identified in the introduction, the conundrum that many IT operations and infrastructure organizations face is the requirement to support new technologies and platforms for the business without it resulting in the magnification of complexity and hence costs. Another interesting data point which comes out is that the need to implement AI and ML-based solutions is not to primarily address the lack of skills. Candidly, this seems a bit hard to square with the need to handle complexity since one would presume that they might be related. However, as the survey data is focused on the needs of our largest enterprise customers, skills availability might not be as acute of a problem as typically faced in smaller organizations.
Summary
For the large VMware customers that we surveyed, there is a strong desire to have more of a self-managing infrastructure to enable IT infrastructure and operations to be a better partner to the business. Enterprises are expecting AI and machine learning based solutions to have fairly sophisticated capabilities. Yet there are limits to the degree of autonomy that these same organizations wish to grant such solutions. Much of this revolves around concerns of maturity and transparency as well as ensuring that new forms of complexity do not arise in response.
What this tells us is that we should ensure that the interface between an increasingly sophisticated automated system and its human partners provides the necessary degree of explainability, awareness and even indicate the degree of uncertainty in terms of its algorithmic conclusions. Only by establishing sufficient trust will the outcome be what some researchers would attest to as a truly effective “joint cognitive system.”
Last year VMware announced Project Magna demonstrating our intention to evolve to a self-driving datacenter, where the infrastructure is continually and automatically optimized to deliver on the intended performance of dynamic applications. Leveraging what is called reinforcement learning, Project Magna will initially focus on optimizing vSAN parameters to meet desired customer key performance indicators (KPIs). More information on this exciting new offering, please visit:
http://www.vmware.com/go/magna