Modern apps need to run in multi-cluster, multi-cloud environments across a mix of traditional and microservices architectures. In this context, enterprise platform, infrastructure, and operations teams are presented with unique challenges in securely connecting and managing modern workloads, in delivering scalable services, or bridging between traditional VM workloads and containers, and supporting production operations for modern apps.
VMware recently introduced the “VMware Modern Apps Connectivity solution”, which brings together the advanced capabilities of Tanzu Service Mesh (TSM) and VMware NSX Advanced Load Balancer ALB (formerly Avi Networks) address today’s unique enterprise challenges.
In this blog, we’ll take a deeper look at this solution and demonstrate how its cloud-native principles enable a set of important use cases that automate the process of connecting, observing, scaling, and better securing applications across multi-site environments and clouds. We’ll also show how state-of-the-art capabilities in this solution — like Global Server Load Balancing (GSLB) and Intelligent Autoscaling — enable enterprises to deliver advanced use cases such as cloud-bursting.
Step 0: Set up (typical HA architecture for a modern distributed app)
Let’s start by looking at our set-up, which is a typical architecture for a highly-available modern app deployment across a hybrid cloud. For this demo, we have the following set-up:
- We are simulating a hybrid set-up with two different GSLB sites, “cloud” and “on-prem”. This demo will work in a similar manner in a real hybrid set-up.
- We have three Kubernetes clusters that have been onboarded to Tanzu Service Mesh, of which two are on the on-prem site and one is on the cloud site.
- Since each of these sites are acting as GSLB sites, we have also deployed an NSX Advanced Load Balancer controller in each of them. These controllers are configured to provide GSLB across both sites in an active-active configuration, with one site acting as a GSLB leader and the other as a follower.
Step 1: NSX Advanced Load Balancer integration plus other configurations
We have also deployed the example “Acme Fitness Application” (github), which is comprised of multiple polyglot microservices. It has been deployed in a split mode, distributed across two on-prem clusters. To start with, there is no application deployed on the cloud cluster.
For apps deployed in Tanzu Service Mesh to start using NSX Advanced Load Balancer’s GSLB setup, we will be configuring TSM to point to the GSLB leader. We’ll also be configuring the Tanzu Service Mesh Service Level Objective (SLO) feature and applying it to the front-end service of our application.
Use case 1: Multi-cluster application pattern and public services
The first use case that we are going to showcase is how Tanzu Service Mesh’s Global Namespaces (GNS) construct (explained in detail here and here) enables secure multi-cluster service discovery and connectivity in a seamless manner, without developers needing to worry about where app services are deployed. With this power of abstraction, discovering and securely connecting these distributed services (using mutual TLS) across multiple clusters becomes super easy.
With this new solution and integration with NSX Advanced Load Balancer, we are not only enhancing the power of GNS to be able to configure and secure public access to services within the mesh, but also doing it in a resilient manner by letting users define GSLB policies on it (including configuration of various GSLB algorithms and health checks). We let TSM automatically handle dynamic endpoint discovery, network plumbing for endpoint connectivity, GSLB configuration, etc. Hence the user need not worry about where the application is deployed or whether it is added, deleted, or updated across clusters and clouds — TSM ensures that traffic is always routed to the healthy service (based on the GSLB config).
Users can choose to deploy the application in a HA pattern to begin with or decide to trigger the deployment at a later stage based on SLO policy needs, as we will see in the next sections.
Note: Though this blog talks about NSX Advanced Load Balancer integration, TSM also supports Route53 as an option.
Use case 2: Service level objective-based intelligent autoscaling
Once the public service is configured and external users can access the application, TSM’s intelligent auto-scaling capabilities (Predictive Response Time Controller) allow users to assign it an end-to-end latency SLO policy (that we created as a part of the initial configuration) and automatically optimize and self-heal the application to achieve the SLO.
TSM does this by automating the detection and reporting of application intent violations, as well as providing advanced forensics and intelligent self-healing capabilities — such as autoscaling — which are able to stop disruptions before they occur. Let’s see that in action in the next part of the demo.
Use Case 3: Cloud bursting
As we increase the traffic to our public service (simulating a data spike), and as the resources of the clusters on the on-prem site are exhausted or cross a certain threshold, users can now trigger the deployment of the application on the cloud site (manually or by triggering automatic deployment). And since the cloud site cluster in our set-up is a part of the same GNS, the creation of the front–end service (the same service name configured as the public service in Part 1 of the demo) automatically triggers TSM to update the GSLB configuration with a new endpoint, pointing to the cloud cluster and steering traffic to the new endpoint based on the GSLB configuration (which by default is round-robin).
This part of the demo showcases how this important and complex use case of cloud-bursting becomes so easy to achieve with this solution.
Use case 4: A more resilient failover mechanism (Enhanced High Availability)
In the event the public service backend goes down on one cluster or site, NSX Advanced Load Balancer normally causes the external GSLB to create a blackhole until the GSLB’s health checks detect the service endpoint down (which could be in the tens of seconds, based on the number of retries, etc.).
In other words, the GSLB service health-check in NSX Advanced Load Balancer has the default configuration for the number of failure checks (i.e., the number of continuous failed health checks before the server is marked down) set to 3, with a send interval of 10 seconds between each check. So, in this case, it will only mark the service down after 30 seconds, and until then it will send traffic to the endpoint even if the backend service is down, which is sometimes referred to as “blackholed traffic”.
With TSM, the ingress gateway running on the destination cluster will detect the failure first and will shift traffic to the other cluster while the GSLB catches up. Hence, with this solution, you end up having minimal packet loss. (Note: this would not work if the whole cluster or the gateway went down, in which case the failover would be initiated based on the GSLB timeout configuration).
We plan to add more features as part of this solution in coming releases, with capabilities not limited simply to load balancing and routing, but also enabling additional features around enterprise-grade security for public services and more.
As the leader in delivering the Virtual Cloud Network, VMware understands the challenges of creating operationally simple models for modern app connectivity and security. Tanzu Service Mesh and NSX Advanced Load Balancer provide an optimal path to enterprise modernization by enabling secure connectivity with consistent policies not only for modern apps across hybrid and multi-cloud environments — as we saw in this blog — but also extending to include traditional applications running in VMs. Stay tuned for the next part of this blog, where we’ll see how the Modern Apps Connectivity Solution enables the VM expansion use case without the need for modifying the VM application or its deployment model.