cloud native devops news products Tanzu Service Mesh

Progressive Updates to Cloud Native Apps Using Tanzu Service Mesh Traffic Management

Shakeeb Ansar, Deepa Kalani, and Abhirama Mallela co-wrote this post.

Releasing new features seamlessly with no downtime in a rapidly evolving microservices-based application can be challenging. VMware Tanzu Service Mesh makes this process easier, removing much of the complexity involved with rolling out progressive updates to cloud native apps. Here we explain how it works.

Site-reliability engineers (SREs) face many Day 2 challenges when dealing with ever-shortening release cycles of cloud native applications, as we discussed in an earlier blog post. For example, any issue identified after a new feature rollout has the potential to impact all users, which in turn could impact the business. To mitigate such risks, new versions of microservices can be validated by exposing them to a small portion of live traffic, observing the data (metrics, KPIs) emitted by them, and then shifting the remaining traffic in a controlled manner if the new feature is stable. 

Graceful traffic shifting techniques such as blue/green deployment, canary deployment, and A/B testing are a few of the most popular and commonly practiced techniques today. Although these techniques can be implemented ad-hoc, they present significant overhead and complexity since they require careful orchestration while deploying the newer versions, managing instance counts of the current and newer versions and tweaking ingress rules. And the complexities grow even more when an application is deployed on multiple clusters. 

How do we overcome these challenges? Enter service meshes! Using a service mesh significantly reduces complexity by providing the ability to configure traffic management policies on the proxies without having to modify applications. But vanilla service meshes still do not address the complexities of managing distributed applications, such as ensuring uniform and consistent application of policies across clusters. This blog post demonstrates how VMware Tanzu Service Mesh can help with that.

Today, designing for multicluster and multiregion  is the norm rather than the exception. As a result, we are beginning to see another pattern emerge: the implementation of these progressive upgrade strategies at a complete application level (i.e., across clusters and regions) rather than at an individual cluster level. This is where the power and flexibility of Tanzu Service Mesh’s Global Namespaces (GNS) can be coupled with its traffic management capabilities. Tanzu Service Mesh’s Global Namespaces enable viewing and managing a geo-distributed application through a single application-level abstraction. By applying traffic management policies directly to services within a GNS, users can simply define policies once, centrally, and never have to worry about managing traffic policies at an individual cluster level or ensuring that the policies across clusters are in sync. Tanzu Service Mesh takes care of that.

To  illustrate this, we’ll show how you can perform a canary upgrade using Tanzu Service Mesh and the familiar Istio Bookinfo deployed into a Global Namespace made up of multiple clusters. To keep the example simple, we are illustrating the upgrade of only a single service within the application.

The above diagram shows two clusters, C1 and C2, connected to Tanzu Service Mesh with the “bookinfo” namespace from each cluster linked to form a “bookinfo” global namespace. Both C1 and C2 have the bookinfo application deployed but with only v1 of the reviews service deployed.

1 – Create a Global Namespace and deploy an application across multiple clusters as shown in the diagram above.

2 – Roll out version v2 of the reviews service in one of the clusters, and define an autoscaling policy for version v2.

3 – Create a progressive upgrade plan that uses the canary strategy to gradually shift traffic from reviews-v1 to reviews-v2. In this example, we are using a simple time-based plan in which we bump up traffic sent to reviews-v2 by 25 percent every 20 seconds. At every 20-second interval, we verify whether the metrics/KPIs from reviews-v2 satisfy the thresholds and proceed with the upgrade if they do.

4 – Begin the progressive upgrade.

5 – Migrate all traffic to new version

For a visual walkthrough showing how application owners can perform a canary upgrade using Tanzu Service Mesh's traffic management capabilities, see the video below.