Why Cloud Native Storage?
Container environments are high-churn, highly dynamic environments. Tens, hundreds, even thousands of containers can be created and deleted every hour. Managing storage for these as they spin up and down manually is impossible because of the massive churn rate. We need an answer that automates storage provisioning for container workloads.
Cloud Native Storage (CNS) is a vSphere and Kubernetes (K8s) feature that makes K8s aware of how to provision storage on vSphere on-demand, in a fully automated, scalable fashion as well as providing visibility for the administrator into container volumes through the CNS UI within vCenter.
Run, monitor, and manage containers and virtual machines on the same platform – in the same way. Simplifying your infrastructure needs, lifecycle and operations. Lowering costs, using a platform you already know for consistent operations across workloads and across clouds. Spend less time managing infrastructure and more time building apps that provide business value.
What is Cloud Native Storage?
CNS is a term used to describe the storage for Cloud Native Applications (CNAs). These CNAs are typically containerized, deployed and managed by a Container Orchestrator like Kubernetes, Mesos, Docker Swarm, etc. The storage consumed by such apps could be ephemeral or persistent, but in most cases, it is required to be persistent.
As more developers are adopting newer technologies like containers and Kubernetes to develop, build, deploy, run and manage their applications, it is important that the VMware stack provides all the necessary infrastructure to run such applications.
Since storage infrastructure for stateful applications is critical, we built CNS natively into vSphere. The purpose of CNS is to solve the storage problems CNAs encounter. Currently, CNS supports workloads running on Kubernetes, since Kubernetes has evolved as the de-facto standard for container orchestration.
CNS comprises of two parts:
- A Container Storage Interface (CSI) plugin for K8s
- The CNS Control Plane within vCenter
CNS provides K8s with the understanding on how to carry out both storage provisioning and management tasks on vSphere. Additionally, CNS provides the vSphere admin with visibility into container usage on the physical infrastructure. This includes mapping container volumes to backing disks and capacity management – just as if they were a VM volume.
Check out the demo video we put together way back in January for the beta to get an idea!
So, how did we get here, and what are we doing now?
The Past – vSphere Cloud Provider (VCP)
vSphere Storage for Kubernetes (aka. vSphere Cloud Provider) was the first vSphere storage solution for Kubernetes that we started way back in 2017. The VCP started out as a project resulting from an internal Hackathon and was made opensource and consumable by the public. It has since been iterated on and improved to keep up with customer demands and bug reports.
Before the VCP, volume provisioning had to be done manually, one VMDK at a time and then mount them into the appropriate worker node, format them and mount them into a container – untennable at scale. The VCP added the functionality to provision and delete storage (VMDKs), attach/detach, mount/unmount and format them for K8s nodes autonomously.
The VCP is current being actively used by various Kubernetes as a Service (KaaS) offerings VMware PKS, RedHat OpenShift, Google Cloud’s Anthos, Rancher, etc. Basically, if your KaaS solution runs on vSphere today, it uses the VCP.
The VCP left something to be desired when it came to visibility and troubleshooting of K8s environments on top of vSphere, capacity management and tracking down problems were hard and required many manual steps.
CNS aims to solve these problems through its built-in dashboards for vCenter, giving the admin visibility into not only capacity usage and allocation, but also volume usage and mapping to pods and applications within your K8s infrastructure, making the IaaS platform aware of the workloads sitting on top of it.
CNS is the next evolution of the VCP and is rewritten with a completely new interface (the Container Storage Interface – CSI), from scratch, to be enterprise ready and conformant to Kubernetes revised direction for storage.
Why are we moving from VCP to CSI?
The VCP served a purpose, it allowed for dynamic provisioning of volumes instantiated through K8s to be created on the underlying vSphere infrastructure, without any admin intervention. However, it was not without its flaws.
The VCP is what is known by Kubernetes as a “Cloud Provider” – In that it provides a mechanism for K8s to understand how to talk to the underlying infrastructure.
Kubernetes Cloud Providers come in two flavours – “in-tree” and “out-of-tree”. This essentially means one is shipped in the box with the K8s binaries when you download and install them (in-tree) and the other you need to install on the cluster after you deploy it (out-of-tree).
Each of these has its pros and cons, but there are three major lifecycle constraints that in-tree has:
- You are tied to the K8s release cycle for Cloud Provider upgrades
- To upgrade your Cloud Provider, you must upgrade ALL of K8s
- If your K8s distribution doesn’t support a newer version of K8s, you are stuck with the Cloud Provider code as-is, it cannot be changed, no bug fixes, no features
As a result Kubernetes as a project has made the decision to remove all Cloud Provider code out of its core codebase and make vendors building integrations use the out-of-tree model.
This meant the only viable solution going forward was a rewrite. As such, we took this opportunity to re-write our storage integration from scratch against the new CSI specification. This makes our storage integration not just future-proof for K8s, but also any Container Orchestrator that supports the CSI spec.This endeavour has been helped greatly by our acquisition of Heptio who provided guidance and steering on our approach to K8s storage and open source work to ensure we are in-line with the community.
The Present – Container Native Storage
Given CNS is built natively into vSphere it’s important that it is built from the start to scale as vSphere does and allows for a plethora of future innovations. The new CNS Control Plane within vCenter itself brokers all requests from the CSI plugin installed on K8s – this allows us to not be tied to a single storage backend and extend the feature with ease in future.
As part of the rewrite the CNS Control Plane was built to use a feature of vSphere known as First Class Disks (FCDs). FCDs allow for VMDKs and volumes to exist and have lifecycles completely independent of VMs.
A FCD is fully managed as a first-class citizen of vSphere meaning no more long-forgotten orphaned VMDKs sitting on datastores consuming space!
Additionally, CNS fully supports SPBM provisioning of these volumes – obviously this works best with vSAN given its native policy-driven storage workflows which align exactly with the K8s StorageClass concept., That said, CNS also works with VMFS and NFS as well should you use tag-based SPBM policies.
CNS is available as part of vSphere 6.7 U3 to anyone with a standard vSphere license or higher – at no extra cost.
Keep an eye out for documentation for CNS installation and spin-up. In the meantime – check out the overview video on YouTube:
If you have any questions – feel free to reach out to @mylesagray on Twitter.
12 comments have been added so far
will it work with vvols as well?
I translated it into Korean.
Just a quick question. How does one backup/restore persistent volumes?
Can vSphere Cloud Provider provision volumes when using tag based SPBM policy? Is this problem fixed?
This is awesome. I am using Openshift 4.2 on VxRail running VSAN 6.7 U3 but the Openshift installation uses the older VCP plugin. I get dynamic provisioning so its great but I don’t see those disks under the Container view that you show here and it does not have the Kubernetes/Openshift labels. Can you point me to the documentation on how to use the new CSI plugin for VSAN 6.7 U3 with Openshift 4.2. Looking forward to getting this setup as this will make it easy for the Container team to talk to the vSphere/Storage team without having to use the long PV or PVC ids.