Since the Cluster API 1.12 release announcement [1], I keep getting questions about immutability in Kubernetes.
Amid this new wave of interest, what stands out is how the Kubernetes community is looking at this topic from a different perspective than in the past.
Today, adopters are less focused on immutability at the individual node level and more interested in how immutable principles apply when operating an entire fleet of clusters.
This is the fundamental reason why the Cluster API project exists, and in this blog I’d like to share my perspective as a Cluster API maintainer.
Why immutability is important
The importance of immutability derives from the benefits this approach provides to the applications that are running on top of Kubernetes.
Based on my experience, the most important benefits that Kubernetes adopters derive from immutability are the following.
Speed
In modern IT, speed is paramount.
For example, you need speed when scaling up your infrastructure to handle spikes in user requests; you need speed when performing blue/green deployments or when scaling down the infrastructure to keep application costs under control or to free up resources for other tasks.
Speed is also crucial when handling maintenance operations that should fit into a limited time window, or when handling any kind of disruption within the wide spectrum that goes from issues to a single application instance to when you are dealing with an entire region down.
Immutability is a foundational enabler to achieve speed in all the above scenarios and even more.
It is thanks to immutability that you can have tools for spinning up ten identical Pods in milliseconds, tools to create new clones of Machines hosting Kubernetes nodes in seconds, or to spin up a new, fully operational Kubernetes cluster in minutes.
Operations at scale
From a certain point of view, IT operations nowadays sit at the intersection between scale and speed.
Immutability is crucial to deal with those kinds of problems, because immutability provides the foundation for creating consistent and trusted clones of your pods, of your VMs, of your entire Kubernetes cluster.
When you start using such types of trusted clones, then you get the confidence that your dev, QA environment, and production environment behave consistently.
Without trusted clones, your operations team will have to deal with configuration drift, and snowflakes don’t scale well.
Security
Immutability is also often linked to security.
While this is generally true, based on my personal experience, there are a few distinctive advantages that immutability gives you from a security perspective. But for other aspects of security there are also valid alternatives an organization can and should consider.
As an example of a distinctive advantage, immutability provides the foundation for building systems where the underlying VMs are ephemeral because they are periodically recreated from scratch, thus providing a moving target that makes attackers’ work much more complex.
On the other hand, e.g. an immutable OS can be used successfully to reduce the attack surface inside a VM, but many organizations are achieving the same goal by using AppArmor, SELinux, hardened images, etc.
Continuing with the same example, also the A/B upgrade process that is a typical immutable OS can have a valid and possibly more robust alternative when you consider rolling out changes by creating new machines and deleting old ones.
Same outcome, but re-using simple create and delete primitives like Kubernetes does for Pods.
Stability
As a software engineer and open source maintainer, I’m always dealing with the fact that there is a limited amount of complexity that a project or a system can absorb over a certain amount of time.
When the complexity budget runs out, bad things happen. Quality decreases and we can’t fix bugs and CVEs in a timely manner.
Immutability greatly helps in dealing with the complexity budget. It allows engineers to drastically reduce the number of variables to be considered, e.g., when managing the lifecycle of objects like a VM that is hosting a Kubernetes Node.
As a result, systems based on immutability usually provide a simpler, more stable and reliable platform for your application.
And stability and reliability matters today as well as for the long-term sustainability of your application stack.
How to achieve immutability in Kubernetes
Immutability at its core is a concept, a philosophy. To make it real, you need technologies that embrace this concept.
Kubernetes is a great starting point: it provides primitives like immutable Pods, and higher-level abstractions like Deployments, StatefulSet etc. to manage these Pods.
But Kubernetes doesn’t take care of the underlying infrastructure.
This is where Cluster API fits in, because one of the fundamental tenets of the project is “it’s Kubernetes all the way down”.
Cluster API makes it possible to treat a Machine, a piece of infrastructure like a VM that is hosting a Kubernetes node, as an immutable component.
When a change is required, instead of modifying the Machine in place, Cluster API creates a new machine and deletes the old one, similar to how Kubernetes replaces Pods.
And by offering abstractions to manage groups of Machines, like KubeadmControlPlane and MachineDeployment, you can use Cluster API to amplify the benefits of immutability while managing an entire cluster or a fleet of Clusters.
If you have a deep understanding of Cluster API, at this point you might object that there is a gray area, which is the OS hosted on the machine managed by CAPI.
Cluster API doesn’t take a strong opinion about the OS hosted on the machine, because, based on what we learned in the last few years, this is where different organizations and products can and should take different approaches.
First of all, each organization has different opinions about which OS to use and what trade-offs are acceptable at the OS level.
Second, many organizations do not want to implement special treatment for the OS of the Machines in a Kubernetes cluster, they prefer to have only one supported operating system and use it across different platforms and types of applications, and in most cases this is not an immutable OS.
The key for dealing with such different requirements and technologies are the many Cluster API extensibility points.
By using those extensibility points, it is now possible to perform a carefully validated set of in-place update operations in a safe and fully automated way.
If you’re interested in a deeper dive into this topic, we’ll also be discussing it in the session“In-place Updates with Cluster API: The Sweet Spot Between Immutable and Mutable Infrastructure” [2] at KubeCon EU in Amsterdam!
A few examples
Rolling upgrades
Instead of performing complex procedures to update existing Machines, Cluster API goes through a rollout where new pristine Machines are created to replace old ones.
This approach, derived from the immutability principles, not only allows you to perform Kubernetes upgrade, but it allows you to perform any kind of change to Machines (infrastructure, OS, and Kubernetes components) by supporting two simple primitives, create and delete Machine.
The process is inherently repeatable and predictable; replacing each Machine ensures that every node reaches the desired state without the surprises often caused by configuration drift.
It is simple, it is robust, it is fast and secure.
On top of it Cluster API ensures that your availability criteria are always met during a rollout. It also supports chained and Kubernetes upgrades spanning multiple Kubernetes minors, and worker nodes can skip upgrades to intermediate Kubernetes minor releases whenever allowed by the Kubernetes version skew policies.
Remediating unhealthy Machines
How can you handle machines that consistently fail, e.g., when the Kubernetes Node that is hosted on the Machine reports Ready condition false for more than five minutes?
Cluster API allows you to define MachineHealthChecks to handle such cases, and when automatic remediation kicks in, the system leverages immutability to solve the problem in a fast and efficient way:
A new, pristine replacement Machine is quickly created and the old one deleted—again leveraging the same two simple primitives, create and delete Machine.
Avoiding unnecessary rollouts
Machines are complex components, and sometimes you want to make changes that do not require a node drain or pod restart; for example, changing the image registry certificate.
If for any reason you want to perform those kinds of changes without a full Machine rollout, Cluster API provides extensibility points to make this possible by allowing a carefully validated set of in-place update operations in a safe and fully automated way. Even in these cases, configuration drift is avoided by applying the same change consistently across all Machines.
Same UX, immutability and the best of mutability, everything under strict control.
Wrapping up
Immutability powers many cloud-native systems, because it enables speed, operational scale, and stronger security practices.
At the same time, combining immutable infrastructure with carefully controlled mutable operations allows organizations to take advantage of other well-known operational approaches where appropriate.
This is the sweet spot between immutable and mutable infrastructure that allows teams to achieve speed, security, and reliability when operating Kubernetes at scale.
References
[1] Cluster API 1.12
[2] KubeCon Session
Discover more from VMware Cloud Foundation (VCF) Blog
Subscribe to get the latest posts sent to your email.