Part 1: Use your SD-WAN data better
Note: This is the first in a four-part series about how VMware Edge Network Intelligence™ provides AIOps in SD-WAN, enabling better insights for IT into client device experience and client behavior.
AIOps helps companies make sense of massive amounts of SD-WAN data
SD-WAN generates vast amounts of data about network activity, distributed application usage and application performance. This data is a potential goldmine of actionable information – if you have the right tools. Let’s look more closely at how your company can use AIOps in SD-WAN as part of an overall approach to networking.
Globally VMware SD-WAN™, a service of VMware SASE™, generates around 10 billion application flow records each day, traversing the SD-WAN network and collected by VMware SASE Orchestrators. This data includes potentially valuable information on application performance throughout the entire network. In these flows, we can see each application, each device using that application, and the experience of those devices.
Advantages of automation and big data
This raw data can be explored and visualized, but not manually analyzed to glean actionable insights. With the magnitude of data coming into the system, that is not practical. Even if a single VMware SD-WAN customer manually explored this data, there’s only so much insight possible. Another advantage of this giant data set is that it encompasses traffic from many customers, from single users to international corporations, across all vertical markets. Data is coming from all over the world from different types of networks. An analytics platform can gain valuable insights by looking at that anonymized data and its context, analyzing it in an intelligent way to provide meaningful benefits.
The VMware SD-WAN platform collects data from multiple vantage points. This allows the same application flow to be viewed from the perspective of the VMware SD-WAN Edge, the VMware SD-WAN Gateway, and/or the SD-WAN hub. With the right analytics, this data allows the system to identify faulty segments between the client LAN, the enterprise WAN, the Internet and the data center LAN. It’s another way of analyzing data in order to gain insights into network problems.
Managing a complex SD-WAN infrastructure requires a new approach
VMware Edge Network Intelligence: A vendor-agnostic AIOps solution that delivers a rich client experience
The modern network continues to grow in complexity as cloud-based infrastructure as-a-service (IaaS) providers such as AWS and Azure, and third-party software as-a-service (SaaS) applications such as Salesforce and Microsoft 365, enter the equation. SD-WAN helps the enterprise interconnect heterogeneous devices – including end user devices such as laptops and phones, and increasingly IoT devices such as point-of-sale and medical devices – with applications that are deployed anywhere and over different types of cloud services.
Locating problems, determining patterns
In this environment, enterprise IT needs to understand many data points when application problems occur. Are problems related to the devices accessing these applications? Where is the problem located? Is it something going on in the campus or branch? Is there a problem in the network? Perhaps it resides in the data center or cloud, or even with the application itself?
Determining high-level patterns affecting application performance is also critical. For example, does a service provider experience regular outages, or is there a fault in the enterprise’s network that local IT staff can fix? Successfully analyzing network data requires context and the understanding of how applications are performing over complex SD-WAN networks. Network data analysis can also provide valuable general knowledge about the trends of application adoption and usage.
Learning “normal,” automatically
An automated AIOps process, fueled by machine learning, is key to make sense of the massive amounts of data that SD-WAN infrastructure generates. VMware Edge Network Intelligence begins by using data to automatically determine baselines – to define “normal” – for network activity and application performance levels. For example, what is a normal user experience for Microsoft 365? What is the normal percentage of devices that fail to connect? What is the normal number of users for a given enterprise?
Baselines also need to incorporate time and network activity, because network activity varies based on the normal business hours and time zone for each enterprise and any branch sites. The system will also learn the norms for each entity’s type of WAN link (satellite, cellular, MPLS, etc.) and vertical (a hospital network will look very different from a retail network). Importantly, VMware Edge Network Intelligence performs these tasks automatically as opposed to a manual and inefficient process. Ultimately, this baseline information helps determine the typical application experience for each user.
Establishing baselines and thresholds
The VMware Edge Network Intelligence platform uses a Bayesian machine learning approach to data analysis. The system analyzes time series application performance data for each device on a network, combining it with other factors including service provider, WAN link type, and business vertical. Once baselines are established for the enterprise network, the next step involves determining the thresholds for any degradation of a specific baseline.
Traditionally, administrators of legacy enterprise networks set performance thresholds manually. This is time-consuming, involves a lot of guesswork, and can lead to alert fatigue. VMware Edge Network Intelligence takes advantage of AIOps to automate this process, including alerting network engineers to real potential issues. It’s no longer a case of “I think my fix worked because I haven’t seen any alerts.” The machine learning routines provide the ability to detect the root cause of faults, and the proof to show that the mitigation strategy worked (or didn’t!).
Shows root causes and how to fix them
The insight provided by analyzing application performance at different vantage points yields enough data for AIOps to determine the problem’s location. VMware Edge Network Intelligence uses a nearest-neighbor-style analysis to identify any correlating symptoms to figure out the likely fault area. Perhaps it’s a poor WAN link at a branch? Or possibly the problem is in the data center, the application itself, or even in the WAN? In the end, the system figures things out automatically using application flow data from multiple vantage points.
You can see an example in the screenshot below. VMware Edge Network Intelligence has automatically detected that an abnormally high number of clients had poor Microsoft 365 performance, indicated by the spike in the graph at the top right. The bottom left of the screenshot shows that Edge Network Intelligence is integrated with this organization’s ticketing system. It determined that the most likely root cause is poor Wi-Fi.
Example screenshot from VMware Edge Network Intelligence shows how applications are performing against a normal baseline, and suggests ways to fix them
AIOps helps determine if network changes really worked
Because VMware Edge Network Intelligence uses machine learning to determine enterprise network baselines and performance thresholds, it’s also useful to track the efficacy of any configuration, architectural, or device changes to the network. Once normal baselines are established, it becomes easy to determine if any network change had a positive, negative, or no effect on network performance.
Implementing a network change, updating the baselines, and analyzing the before and after performance data provides a more accurate picture of the result. It’s a better approach than the legacy model, where network engineers relied on anecdotal reports and imperfect manual analysis. Ultimately, the VMware Edge Network Intelligence automated AIOps approach provides a more objective view of the effect of any configuration change.
VMware Edge Network Intelligence data analysis can truly show the ROI provided by the SD-WAN platform – after both the initial implementation and any subsequent configuration changes. It goes beyond just learning baselines, identifying deviations in performance, and alerting IT staff. Edge Network Intelligence enables enterprise IT to be proactive about their network environment, fix problems before they happen, and predict where issues might occur so that they are handled ahead of time.
Analyzing historical data for proactive network management
Using VMware Edge Network Intelligence machine learning algorithms to analyze historical network data helps create a model of where problems typically occur and their impact. The platform actually makes recommendations and predictions to the enterprise IT team on the effects of any network changes. For example, the platform might predict that prioritizing a billing application hosted in the data center improves client experience on devices using the application over the network. This prediction is based on historical analysis, noting poor experience with this particular application during various usage scenarios. Machine learning routines can cluster this historical data, aggregating the data by their similarities. Identifying common issues drives the platform’s recommendation engine. VMware Edge Network Intelligence uses data across customers for this analysis, tying back into the concept where learning from all customers benefits each customer.
The time is right for AIOps in SD-WAN
AIOps is becoming an essential ingredient in any SD-WAN implementation. The power of machine learning helps enterprises make sense of the massive amounts of data generated by any network on a daily basis. Predictive insights make it easier for network engineers to ensure higher network performance while end users and IoT devices gain higher productivity.