In this article, we outline how to identify candidates for serverless architectures within your Kubernetes clusters and as a result, how you can improve the peak load performance of your cloud services and save on monthly cloud infrastructure costs.
Myth: Containerized microservices are so small, that you’re not likely to see any significant cost savings from a majority of cloud infrastructure optimizations.
Fact: The CloudHealth platform helped save VMware close to thousands of dollars on monthly infrastructure costs on just a few microservices.
In this article, we’ll outline how for just a single service, you can leverage CloudHealth to improve peak load performance for several other cloud services, and at the same time, save big on monthly cloud infrastructure costs.
Identifying optimization opportunities in a microservice architecture
Many organizations today benefit from container technologies like Kubernetes to manage their microservices architecture. However, because each service within the architecture can be shared, reused, or dependent on several other services, it can be difficult to identify consistent resource usage patterns and associated optimization opportunities.
From regular infrastructure monitoring, we can occasionally see patterns in resource usage and application behavior. For example, in the image below, you can see that typically, the microservices have a lesser resource footprint on weekends. The following screenshots are from Tanzu Observability (Wavefront), which we use for observability and monitoring.
In other cases, microservices will have peaks or burst usages during short intervals, as seen in the image below.
In this example, resource usage suggests that the application is almost idle most of the time, but with burst CPU usages for about one hour, twice daily. We may be able to save resources and costs by limiting the resources during these durations, however, it’s important to have a good understanding of the application before performing any disruptive optimizations.
In order to figure out the reason behind such a pattern in any application, it requires consulting with the development team to understand its architecture and the reasons behind it. When you have development teams across three geo-locations with several shared applications, it becomes a challenging task.
To add to the complexity, usage patterns (CPU, memory, etc.) usually vary depending on the resource and the granularity of data considered (12-hour versus two-day timespan).
As you can see, once applications are developed and deployed, it’s much harder to optimize. There are already hundreds of microservices with thousands of metrics, and developers are usually more focused on delivering new services to customers faster, instead of optimizing existing architectures.
Leverage CloudHealth to optimize your microservices architecture
Unless you have concrete numbers to quantify possible savings, it’s challenging to justify taking action on infrastructure optimization recommendations. The CloudHealth platform enables you to see granular information about resource usage patterns and predict cloud cost savings from architecture optimizations.
To dive deeper into our microservices, we created a Perspective for each of our Kubernetes clusters using the already tagged resources (like EC2 machines corresponding to the nodes). With this, we were able to calculate the cost of the Kubernetes clusters, and by extension, of each namespace.
We interpolated the cost of each Kubernetes pod using the share of allocated resources, keeping in mind that this particular pod had five replicas. The AWS nodes in our clusters are billed about $725/month, excluding enterprise discounts, with average node specifications including 128GB of RAM and 16vCPUs. Considering the allocation of 8GB and 4vCPUs to this pod, we were holding approximately 31.25% ((8/128) + (4/16)) of the node resources. Therefore, the cost of five replicas is about $1,125 per month.
Considering the fact that we only use these allocated resources two hours a day, we had the potential to save 92% (22/24 hours) of the costs, which comes close to $1,035 on a single microservice.
On understanding this particular application, we saw that a periodic execution happens every 12 hours for every customer. However, there are several other out-of-schedule executions that can happen anytime—such as customer onboarding, customer-initiated execution requests via CloudHealth support, etc., which don’t have as big a load and won’t be glaringly visible in resource usage.
We’ve used Vertical Pod Autoscaling to tune resources, but these are not flexible enough in many cases to handle any sudden peak load. Obviously, we needed a serverless architecture (think Lambda), but with an option to immediately trigger ad-hoc requests. Since the initial presumptions that the cost improvements would be insignificant turned false, (thanks to the CloudHealth platform) we could immediately prioritize a revamp for this particular application.
We chose a hybrid model, using a job framework on Kubernetes to handle the scheduled executions, in the form of serverless functions, with regular Kubernetes pods (with only two replicas and fewer resources) for low-delay ad-hoc executions.
The new cost of the service is now approximately $550/month, which is more than 50% less than the original cost of $1,125. This was one particular example we caught in the act, but there could be several others.
Automation
And since development is at such a fast pace, application behaviors change over a period of time. We needed to get some automated recommendation tools that could help us identify these patterns to do such infrastructure optimizations faster, and so we started working on a tool that could understand the interactions among microservices, identify and detect patterns in resource usage and help us get such recommendations. The tool we built essentially performs a series of five-step processes:
1. For all resources, it models usage using Seasonal ARIMA time-series modeling to figure out any resources with repeating usage patterns.
2. We use spike detection algorithms with several thresholds to ascertain and filter the above resources with a majority of usage in the form of bursts, and the rest of the time the application has negligible usage.
3. We mark the intervals as active and idle, during which this particular resource suggests that the application is idle.
4. We repeat the above process for all the other resources in consideration and try to collate and aggregate the intervals. This aggregation is the crucial step, since we have to take care of several small but valid cases, such as an imperfect overlap between resources (imagine an application performing some reads from S3 before performing some computation on it. The spikes in network usages will be slightly before memory).
5. Calculate similar experiments across the span of interacting applications, with the span of interactions obtained by the distributed tracing feature of Wavefront by VMware.
Finally, once we have this classification, we try to find the cost savings in the same way as above, by calculating the daily average time the application as a whole is active. This helps us filter out only relevant recommendations, such as only applications that have the potential to save costs by 60% or by $200.
The overall flow of the method looks something like this:
Since this is a method that might help several others, VMware is planning to open-source the recommendation tool in the near future. This tool and methodology were also presented in Kubecon EU 2020’s Serverless summit this August. You can watch the recording here.
And for more information on serverless technology and how to optimize your cloud infrastructure, see our in-depth whitepaper: Why Serverless is the Future of the IT Industry