YES! You can declare your application resiliency state and keep it like that with a combination of Kubernetes and the new application resiliency capabilities in Tanzu Service Mesh.
First things first: what is Tanzu Service Mesh?
Tanzu Service Mesh allows you to create and isolate a logical structure in a Kubernetes cluster, or across different clusters, to achieve an application layer 7 networking and security fabric that you can add values on top of. Just by connecting the dots, we get service discovery, observability, security, and encrypted connectivity for all objects in that global namespace structure. More about TSM global namespaces in excellent blogs here and here.
In this blog, I focus on a new feature that (in my opinion) is a real game-changer for the way we operate and manage application resiliency. As background, I used to work on the customer side for most of my technical career, in operations and infrastructure roles, and the thing I was mostly concerned with was the application and user experience. We had multiple application monitoring solutions that continuously tested user experience via methods such as synthetic transactions (not real user ones) or tap the transaction to get the live experience. Once we got an alarm that something was wrong with latency and/or the experience, we moved to identify the root cause in an RCA process (root cause analysis). But in the time between identification to resolution, our application wasn’t behaving in a healthy manner.
Now, what if we could declare our desired (expected as healthy) application behavior, just like we declare a Kubernetes manifest? Not just stating a threshold, but declaring a healthy application state which Kubernetes would then enforce? Sound like science fiction? Well, it’s not, because that’s exactly what we will soon deliver with a combination of Kubernetes and the new Tanzu Service Mesh feature of application SLOs and application autoscaling.
With Tanzu Service Mesh declarative SLOs, you can configure a definition of user experience SLIs, like p50, p90, p99 latency (p50=50% of transactions, p90=90% etc.), or performance metrics like CPU consumption, request per second, and more. With this configuration, you will now be notified when an SLO is violated — and thus gain visibility into the health of your application. The really cool science-fiction part is: you can act on that definition using a distributed autoscaler to automatically scale your application deployment and return to your intended state or SLO. That SLO definition of the application is declared in the Kubernetes clusters that are part of the Global Namespace in TSM with a new custom resource definition (CRD): autoscaling.tsm.tanzu.vmware.com.
You can read more about that feature in the next blog.
Now for a demo
The following demo is based on the ACME application, which is a polyglot application built in a microservices architecture. I’m using this same application across all my TSM demos because it’s easy to slice components and get visibility.
In this demo, I’m distributing services in multiple clusters, where the ACME front-end service in Kubernetes is called “shopping” and all the back-end services run on a separate cluster. When you connect those objects under the same global namespace, you have a service discovery map. This can work not only across clusters in the same site but also across clouds; check out the following blog to see how far you can take this. The screenshot below is from my GNS:
In this topology map we can see the connectivity between the front end and all the services in the back end. Now, I will create an SLO definition for my front end that will enforce my desired user experience. By hovering above the service itself, you can see the current p99 latency: ~960ms. I will now create an SLO definition that will alert me when that latency goes above 500ms,; basically, we are simulating an Error in the SLO policy.
The configuration of the SLO definition will be p99 < 500ms, and then we add the percentage of time to meet that requirement. In my case, to simulate an error, I’m configuring 100% of the time. So the app p99 latency should stay under 500ms all of the time.
The services to which I’m applying the SLO policy is just the single front end item called “shopping”.
By applying the policy, I can immediately see the error and violation of the policy in the performance page of the “shopping” service.
Now I’m applying to my Kubernetes cluster the declarative state for autoscaling my application. I’m stating that the deployment can scale up to 10 replicas if the threshold of p99>500ms is met. And it will scale back down when the threshold is p99<300ms.
The violation of the SLO immediately triggered an autoscaler on the Kubernetes side. I can see the deployment scaling up to meet the intended state of the application SLO.You can see that on the Kubernetes side by getting pods in the namespace:
You can also see that action on the Tanzu Service Mesh console in the Performance and Instances pane:
When application latency goes down, the intelligent auto-scaler will scale down and will always make sure it maintains the intended state you declared in the SLO policy. This can also be based on CPU and memory — not only latency — and we’ll add more SLOs as the platform evolves.
To sum up, this is a game-changer in the way we use policies and declarative infrastructure in a multi-cloud/multi-cluster distributed manner in order to keep our intended state(s), not just on the infrastructure side but now also on the application side. And we’re only just getting started. Stay tuned!