In working with FinOps professionals, we regularly get to hear stories about the trials and tribulations of cloud cost management and optimization. Luckily, most of these stories have happy endings, with tales of challenges overcome and lessons learned. But recently, we heard the story of an unfortunate incident where a cloud operations manager (not a customer of our ours) was relieved of his/her position after some critical Kubernetes clusters in their environment failed to run, resulting in irrecoverable revenue lost for the business.
Though the story didn’t end well for this individual, there are lessons to be learned for the rest of us. This kind of unfortunate incident can be avoided with the right visibility into your Kubernetes workloads. VMware Aria Cost powered by CloudHealth recently delivered usage reporting for Kubernetes-based workloads, adding to our existing set of reports for containerized cost visibility and chargeback.
Setting appropriate resource requests and limits for Kubernetes pods is critical for ensuring application performance. But without the ability to map requests and limits against pod usage, application developers and cloud operations managers are left in the dark, wondering if their containerized workloads are at risk for shutting down at any moment, or on the flip side, reserving capacity that is going unused and thus wasting precious budget.
Kubernetes usage data in VMware Aria Cost powered by CloudHealth
With this latest release from VMware Aria Cost powered by CloudHealth, users can easily see how much CPU/Memory (i.e., capacity) their Kubernetes-based workloads are consuming in relation to the CPU/Memory requests and limits set by the application developer. In the same view, users also see the total capacity available to this cluster.
Optimize cluster requests and limits with real-time data
We know that in order to keep Kubernetes workloads running smoothly, usage should consistently remain below the requested CPU/Memory. We recommend looking at usage averages, but don’t make any decisions about reducing resource availability without also looking at usage maxes. For the most critical workloads, it’s important that resource demands at peak times can be met.
Identify potential problems before it’s too late
When we start to see usage exceeding the requested CPU/Memory, application performance is likely to decline. And if CPU/Memory limits are being exceeded regularly, your workloads are at risk for shutting down, as they’re competing with other workloads for shared node capacity. Certainly, if the total capacity available to the cluster is less than required for the workload to run, you’re going to have a problem. And if you are the person responsible for keeping these clusters up and running, your job may even be at risk.
Usage patterns like this one mean it’s time for the cloud operations manager to have a conversation with the application developer about increasing request amounts for this cluster to ensure that these microservices have access to all the resources they need to function.
Cloud operations managers and FinOps professionals can use VMware Aria Cost powered by CloudHealth to drill down into Kubernetes data to identify which team, which application, and even which engineer needs to take action. Showback of usage data to those responsible for reserving capacity often has a rapid remediating effect.
By way of example, this report reveals which Namespaces are using either too much or too little of the requested CPU hours for each. The user can then drill down further into a Namespace that is requesting too much or too little capacity and communicate this to the resource owner to take the necessary action to rightsize requests.
Perform showback and chargeback
With the introduction of Kubernetes usage data to the VMware Aria Cost powered by CloudHealth platform, FinOps professionals now have all the tools needed perform both chargeback and showback. Cloud FinOps practitioners can inform finance teams and business leaders how much of shared Kubernetes costs to chargeback to each team, business unit, or billing group. They can now also showback usage data to devops teams and cloud operations managers for resource optimization.
If you’d like to learn more about Kubernetes cost allocation and usage optimization with VMware Aria Cost powered by CloudHealth, check out our white paper: FinOps for Kubernetes.