Throughout the history of digital telecommunications, communications service providers (CSPs) have sought new technologies and processes to remain ahead. In recent times, the most significant assets of CSPs have always been their networks, delivering high-speed, low-latency communications services to millions of customers.
To effectively serve their customers, CSPs need their network to function more like a utility than a standard network transport infrastructure. With the advent of 5G, networks are now often built with tens of thousands of network nodes, over a million ports, but also with the cloud infrastructure containing thousands of servers that host both virtual and containerized network functions. CSPs, therefore, work constantly to revamp the systems that monitor their networks and services to continue to meet rising customer expectations.
CSPs often face challenges of managing their IP backbone, Metro Ethernet, and transport networks while maintaining nonstop availability and high-speed performance of the cloud infrastructure. Many CSPs lack the ability to quickly diagnose and respond to issues across their sprawling physical networks and cloud infrastructures – and more to the point, to be able to automate the analysis of and responses to issues.
The Problem: Staying afloat in an ocean of network alarms
An operations team can be overwhelmed by tens and thousands of alarms every day from its massive, multi-layer network and existing solutions, making it impossible to quickly zero in on the real, serious issues, rather than the symptoms. Worse, even when the true problem is identified, the operations team often cannot tell which customers are affected without manually comparing the network topology, the cloud infrastructure, and the customer management databases. This manual approach makes it difficult to immediately recognize which problems are likely to result in expensive violations of service‑level agreements (SLAs). Additionally, the CSP’s network management system (NMS) rules have to be updated each time the network topology changes, or a new service is deployed—a huge manual effort that takes thousands of hours annually.
The Solution: Multi-layer root-cause analysis, monitoring, and reporting
With the service assurance capabilities of VMware Telco Cloud Platform, you gain new and clear visibility into cloud infrastructure while observing the performance of your physical networks by obtaining various data points across multiple layers, from physical and virtual infrastructures to services. Having multi-layer data points enables you to perform more comprehensive root-cause analysis, converting the data points to actionable insights for faster trouble remediation and resolution.
Figure 1: Multi-layer, multi-vendor service assurance.
The multi-layer, multi-vender service assurance from VMware by Broadcom drastically simplifies network operations, making life much easier for you and your operations teams while improving the service experience of your customers. The solution automatically discovers the topology of the network through standard protocols by interfacing with thousands of physical devices and virtual and containerized network functions from a multitude of vendors.
The service assurance solution continuously maintains a view of the service providing objects, their interrelationships, the configured services, and consuming subscribers to automatically determine the root cause of issues. The benefits of this comprehensive view, consisting of the physical networks and the cloud infrastructures, results in a reduction of troubleshooting time from hours to minutes and eliminates unnecessary truck rolls.
When issues arise, the solution functions as your primary network operations dashboard, automatically correlating symptoms to identify root causes and deliver those details and the insights to the dashboard. This primary and unified dashboard enables you to quickly prioritize and respond to critical issues instead of spending hours sifting through low‑priority alerts.
Figure 2: Unified service assurance dashboard.
Since the solution tracks hundreds of thousands of physical and virtual network elements across multiple network domains built with multiple vendors while capturing the relationship between all the elements and services in your network, the dashboard displays an always‑accurate view of your end-to-end networks.
The Future: From insight to automation
Having the right tools to achieve your goals is paramount. Even with the best processes and people in place, it is impossible to underestimate the power of a well-designed automated solution to help do your job.
System design is especially important when looking ahead. Tools must not only help meet your current goals but also leave the door open for future evolution. By combining service assurance capabilities with the powerful telco-focused automation capabilities of Telco Cloud Platform, you can tap automation to streamline remediation, even as you modify or expand your network:
- Introduce more automation in how rules are added and updated, to ensure that more accurate and up-to-date rules are in place for identifying the anomalies in the network. Automation will not only simplify the remediation of the anomalies but also make it easier and faster to support evolving services and introduce new ones.
- Reduce the burden on your operations teams to write, test, and maintain the rules by introducing automated closed-loop remediation in how faults are reported and, more critically, in how they are fixed. Automated closed-loop remediation reduces trouble resolution from hours to minutes or minutes to seconds. This reduction provides more dynamic assurance with less human involvement to notify customers and take remedial action.
Figure 3: Automated closed-loop remediation.
In the past, automation of this kind would have been difficult to attain, or it involved starting afresh with new technology. The service assurance and automation capabilities from VMware, however, are designed in a way that this kind of automation can be added in a modular way for 5G and beyond.