Buckle up, 5G is here
Over the last year, and in every day that goes by, we see 5G becoming a reality. The 5G evolution, thanks to the promise of the low latency and high bandwidth, enables many new use cases that were unavailable until today, with new revenue opportunities, such as autonomous vehicles, smart healthcare, smart cities, and industry 4.0.
On the other hand, 5G technology comes with multiple challenges. Network reliability and performance in multi-vendor, multi-domain, and high data-volume infrastructure demands an end-to-end assurance solution that can detect issues, correlate issues between all the parts of the infrastructure, and react quickly with the right remediation action to ensure zero downtime.
Detect fast, react fast
Embracing a 5G architecture calls for changes in the way telco operators operate their networks. Traditionally, a Network Operations Center (NOC) would split between the infrastructure operations and the network services operations. However, managing 5G operations requires correlation between the various layers.
When deploying a service built on a cloud-native architecture that is hosted on a cluster of COTS (Commercial off the shelf) servers, telco administrators are often faced with the following issues.
- Monitoring alarms stemming from different layers of the infrastructure, such as physical servers, the virtualization layer, or Kubernetes clusters.
- Correlating these issues and arriving at the right root cause of the problem
- Meeting SLAs defined to keep the networks running at optimum levels
Additionally, in a distributed cloud environment, such as open RAN or MEC, transport network operability becomes a key reliability factor for high quality services delivered at the network edges.
To overcome the complexity of operating 5G network services, an operator needs to integrate with various Element Management Systems (EMSs) from different domains, open-source tools, inventory systems, and orchestration systems, all preferably visible under a single pane of glass. Correlating between all the alarms, logs, and metrics, identifying the root cause of issues, and then reacting accordingly to solve the problem is an additional task. Because every moment of delay could lead to additional failures, all of this must happen quickly – i.e., detect issues, correlate them, isolate the root cause, and react to remediate.
Automation, automation, automation
After identifying the root cause of the alert, a remediation action might be simple task such as escalating the issue to a different team member or opening a trouble ticket, or some more complex actions, such as connecting to elements for debugging, running some query requests, or invoking a workflow in an automation tool. The rich variety of tools and capabilities requires rich skills set and familiarity with the different tools and protocols.
To simplify the process, and reduce the reaction time dramatically, many of the operations can be automated. To automate operations, the operations-user needs to identify the right policies that trigger remediations to solve a problem. Some policies might be simple, and will include one or two metrics and conditions, but in many cases, policies might become complex, composed from metrics and conditions from various domains into one policy formula.
When looking to the future, AI/ML can assist in reducing the complexity in building the right formula of the remediation policies, which will increase the network operations efficiency and reliability. Also, it would allow recommending the operations user the best remediation tool to use in every issue. Moreover, using anomaly detection and trends, AI/ML will be able to prevent failures and performance degradation proactively even before the failure occurs.
VMware Telco Cloud Service Assurance offers closed loop remediation
VMware Telco Cloud Service Assurance lets you monitor and manage 5G physical and virtual components from the mobile core to the RAN and the edge. From a centralized location, you get integrated operational intelligence on 5G infrastructure and components across multi-vendor domains, including physical, virtualized, and containerized environments.
Network operations centers (NOCs) and service operations centers (SOCs) can manage many networks as one, to rapidly resolve network performance issues. Automatic association of issues with customers enables you to remediate issues by priority and deliver a consistent level of service quality. The result reduces costs, improves operational efficiency, and boosts customer satisfaction.
The platform’s root cause and service impact analysis capabilities resolve problems quickly by automatically correlating symptoms from the layers of the infrastructure stack (physical, virtual, Kubernetes, CNFs, VNFs, and services) and pinpointing the problem’s root cause. After identifying the root cause, VMware Telco Cloud Service Assurance provides a rich set of manual and automated remediation capabilities, from an SSH connection to the device and opening tickets in third-party systems to full remediation workflows, including integration with various third-party systems and orchestrators. Using its ML/AI capability, VMware Telco Cloud Service Assurance acts a proactive protector in detecting issues before they occur and remediating them with the right action at the right time.