The Kubernetes vSphere CSI driver is becoming more and more prominent as it gradually replaces the original implementation of vSphere storage for Kubernetes, the vSphere Cloud Provider (also known as “Hatchway”). 

In this post, we will review the evolution of the storage driver’s implementation (from in-tree to out-of-tree), dive into the details of the vSphere CSI driver’s architecture and components, and then walk through its installation and use on TKGI and TKG. We will end by highlighting details of the support for the vSphere CSI driver on TKG clusters (for both vSphere 6.7U3 and 7.0) in order to provide an overall understanding of how it is implemented across all VMware products. 

Kubernetes’ implementation from in-tree to out-of-tree 

VMware has been at the forefront when it comes to making vSphere the best platform for running Kubernetes. The original implementation of vSphere within Kubernetes has already been used thousands of times over by customers and ourselves. But the in-tree drivers are being deprecated from Kubernetes (for the CSI driver, the goal is to deprecate entirely the in-tree implementation by Kubernetes v1.21, for example), so it’s time to look at what’s next.  

As VMware is one of the highest-ranked active contributing companies to Kubernetes, our continued involvement and leadership within the Kubernetes community is just one indication that vSphere will continue to be the best platform for Kubernetes. Our goal is to enable customers to adopt Kubernetes quickly and seamlessly. As the community progresses towards mainstream adoption of the container storage interface (CSI) and cloud controller manager (CCM) drivers, VMware has exceeded functionality from the in-tree driver and now offers full support for CSI. 

The Kubernetes project is very much guided by one of the core principles of open source software, namely that you must do what’s in the best interest of everyone involved. VMware makes sure that any change, feature, or functionality is exposed in our drivers. We are responding to the desires of the community by enabling vSphere compatibility for all of the components in the stack, from compute to storage to networking. VMware deeply values the open source community. Not only are we adhering to the decisions made by the community, but we are also contributing to it in a meaningful way by moving this project forward.  

VMware is also embracing the trend towards more open source components. The vSphere CSI and vSphere CCM drivers are available on GitHub along with documentation on installation and configuration (CSI and CCM). Not only are these drivers available for public use, they are now supported components within our commercial offerings. This gives anyone the ability to see exactly what code is being merged and how the features are implemented, and provides an added avenue of oversight for new requests.  

To understand the overall context behind the in-tree volume plugins and the decision made by the community to move forward with the out-of-tree CSI implementation, this post by the SIG storage team provides all the necessary information.    

 vSphere CNS CSI architecture and components 

The Kubernetes vSphere CSI driver is implemented under an architecture called vSphere CNS CSI, which is comprised of two key components: 

  • The CNS in the vCenter Server 

  • The vSphere volume driver in a Kubernetes cluster 

The CNS control plane introduces the concept of volumes, such as container volumes and persistent volumes, in vSphere. It is the storage control plane for container volumes, responsible for managing their lifecycles, including operations such as create, read, update, and delete. It is also responsible for managing volume metadata, for snapshot and restore, for volume copy and clone, as well as for monitoring the health and compliance of volumes. These volumes are independent of the virtual machine lifecycle and have their own identity in vSphere.  

In a Kubernetes cluster, CNS provides a volume driver that has two subcomponents—the CSI driver and the syncer. The CSI driver is responsible for volume provisioning; attaching and detaching the volume to VMs; mounting, formatting, and unmounting volumes from the pod within the node VM; and so on. The CSI driver is built as an out-of-tree CSI plugin for Kubernetes. The syncer is responsible for pushing PV, PVC, and pod metadata to CNS. 

 

The following pods are deployed in a Kubernetes cluster:

The vsphere-csi-controller pod (deployed as K8s statefulset), which 

  • Plays an intermediate role between the K8s api-server and CNS component on vCenter 

  • Watches k8s events related to PVC/PV objects (creation, deletion) 

  • Invokes calls to CNS for volume operations (create/delete/update/attach/detach) 

  • Syncs the metadata of PVC/PV objects with the CNS internal database

The vsphere-csi-node pod (deployed as K8s DaemonSet), which 

  • Interacts with the local kubelet on the worker node 

  • Performs operations related to pod access volume (format/mount/unmount) 

Only the vsphere-csi-controller pod needs to interact with vCenter, to communicate with the embedded CNS component. 

Now let’s go one layer deeper and dissect all the containers living in the above two pods. 

(Disclaimer: Much of the information shown below comes from the blog post “Supercharging Kubernetes Storage with CSI” on the VMware Tanzu blog portal.) 

 

The following are containers inside the vsphere-csi-controller pod (output is taken from the kubectl describe pod command, then filtered). 

# kubectl describe pod vsphere-csi-controller -n kube-system

 csi-attacher: 

    Container ID:  docker://ada8e66ef776adc8884e3a26af3381668766ce31a3e320305643445525019e04 

    Image: quay.io/k8scsi/csi-attacher:v1.1.1 

  vsphere-csi-controller:   

    Container ID:  docker://15a16675a58f6ee6beabdc5324851cdf059820c554bf98e5b76416e0f26744d6 

    Image: gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.2 

 liveness-probe: 

    Container ID:  docker://57c9498613fcb30b67963797a28a125bae2bb3d21e6ff9d18b521236be54a576 

    Image: quay.io/k8scsi/livenessprobe:v1.1.0 

  vsphere-syncer: 

    Container ID:  docker://4196c2be0f21cc34b15677715b9edd529fd031c8fc667380847635180fbfe553 

    Image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.2 

  csi-provisioner: 

    Container ID:  docker://9eab0125e0486500301c6c799631cd776b34ca7eb4aa2f2e54624f90bed2fd76 

    Image: quay.io/k8scsi/csi-provisioner:v1.2.2 

Here are the details of each container. 

  • csi-attacher: 

Watches the API server for VolumeAttachment objects, which are an indication to initiate a gRPC call to the vsphere-csi-controller (CSI driver) container to attach a new volume to a specified node. When the object is removed, it initiates a call to the CSI driver to detach the volume. 

  • vsphere-csi-controller (CSI driver): 

Interacts with the CNS component on vCenter to invoke API calls for volume operations: Create/Delete/Update/Attach/Detach. 

  • liveness-probe: 

Guarantees the overall pod is running properly. 

  • vsphere-syncer: 

Reports K8s metadata information related to PV and PVC back to CNS for display. 

  • csi-provisioner:  

Handles volume creation and deletion requests by inspecting the API server for new PersistentVolumeClaims (PVCs) objects that are annotated for CSI. Upon discovery, it uses information from the PVC to initiate the creation of a new volume by delegating volume operations to its co-located vsphere-csi-controller (CSI driver) container using gRPC calls. Conversely, when the PVC is removed, this component automatically sends volume operation calls to the driver to delete the volume. 

The following containers are found inside the vsphere-csi-node pod (output is taken from the kubectl describe pod command, then filtered). 

# kubectl describe pod vsphere-csi-node -n kube-system

node-driver-registrar: 

    Container ID:  docker://8f8c7893ebf4282faf33e0eafe7580a0db7ef4ce19a722d6ea32b2f929f4a4a8 

    Image: quay.io/k8scsi/csi-node-driver-registrar:v1.1.0 

  vsphere-csi-node: 

    Container ID:  docker://d0e2c5bf08d507a4dd077659fcc331a4f9a09ae6cb4f7ee9622f502cefeebc3b 

    Image: gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.2 

liveness-probe: 

    Container ID:  docker://09f24ce44e926b505ca3843fa42ca27a7a88299e2a1fe47c482fb28429345b4e 

    Image: quay.io/k8scsi/livenessprobe:v1.1.0 

Here are the details of each container. 

  • node-driver-registrar:  

Responsible for registering and exposing the Identity service endpoint of the vsphere-csi-node (CSI driver) to the kubelet using an internal plugin registration mechanism. Once registered, the kubelet can call the driver using gRPC over Unix domain sockets to handle node-specific volume operations such as mount. 

  • vsphere-csi-node (CSI driver): 

Performs operations related to volume pod access (format/mount/unmount). 

  • Internal Kubelet plugin: 

Runs internally as part of the kubelet and is opaque to driver authors and CSI users. It is responsible for coordinating volume operations that require access to the node’s filesystem. For instance, the CSI kubelet plugin receives requests for mounting volumes to be made available for workload pods. After preparing the mount point, it delegates the operation to the vsphere-csi-node (CSI driver) container to complete the operation.

  • liveness-probe: 

Guarantees the overall pod is running properly. 

Deploying the vSphere CSI driver on a TKGI-based K8s cluster 

To deploy the vSphere CSI driver on a TKGI-based K8s cluster, there are a number of prerequisites:

  • vSPhere 6.7U3 or later is installed. 

  • TKGI 1.7.0 or later is installed.

  • The TKGI plan used to create the K8s cluster has the option “allow privileged” is enabled. 

  • The vsphere-csi-controller pod has access to vCenter. In the case of Flannel, that means the K8s worker node hasaccess to vCenter. In the case of NSX-T, the floating IP, which is allocated to the SNAT for the namespace kube-system in T0 (or T1 if a shared T1 model is used), is able to reach vCenter. 

As of TKGI version 1.7.0, the installation is manual (the plan is to have the process fully automated in upcoming releases) and involves five steps: 

Step 1: Deploy a K8s cluster using the proper TKGI plan 

Step 2: Create a CSI secret 

Step 3: Create the ServiceAccount and ClusterRoleBinding roles

Step 4: Install the vSphere CSI controller StatefulSet and the CSI driver 

Step 5: Install the vSphere CSI node Daemonset 

All YAML manifest files are located here and the supported version of the vSphere CSI driver is 1.0.2. For more information, this GitHub page contains all the detailed steps. 

Let’s dive into each step.

Step 1: Deploy a K8s cluster using the proper TKGI plan 

Instantiate a new K8s cluster by using the following command: 

$ pks create-cluster pks-cluster-cns-csi –external-hostname pks-cluster-cns-csi –plan large –num-nodes 3 

A reminder that the TKGI plan ‘large’ has the attribute ‘allow privileged’ enabled. 

Step 2: Create a CSI secret 

Create the file csi-vsphere.conf as follows:  

[Global] 

cluster-id = "pks-cluster-cns-csi"

[VirtualCenter "10.1.1.1"] 

insecure-flag = "true" 

user = "[email protected]" 

password = "password" 

port = "443" 

datacenters = "vSAN_Datacenter" 

Customize the content of the file based on your environment. The individual fields are defined as follows: 

  • cluster-id: cluster ID of the K8s cluster ( must be unique for every K8s cluster where CNS CSI is installed) 

  • VirtualCenter: IP or FQDN of the vCenter instance 

  • User/password: credentials to access vCenter 

  • Datacenters: name of the vSphere datacenter 

This document lists the vCenter roles and privileges required for CNS and CSI, which grant the average vCenter user minimum privileges for all vSphere CSI driver-related operations. 

To create a K8s secret leveraging the above file, apply the following command: 

$ kubectl create secret generic vsphere-config-secret –from-file=csi-vsphere.conf –namespace=kube-system  

secret/vsphere-config-secret created

Step 3: Create theServiceAccount and ClusterRoleBinding roles

Create those three K8s objects by applying the following YAML file: 

$ kubectl apply -f vsphere-csi-controller-rbac.yaml

serviceaccount/vsphere-csi-controller created

clusterrole.rbac.authorization.k8s.io/vsphere-csi-controller-role created

clusterrolebinding.rbac.authorization.k8s.io/vsphere-csi-controller-binding created

Step 4: Install the vSphere CSI controller StatefulSet and CSI driver 

$ kubectl apply -f vsphere-csi-controller-ss-data-1.yaml

statefulset.apps/vsphere-csi-controller created

csidriver.storage.k8s.io/csi.vsphere.vmware.com created

The most notable result from this command is the creation of the vSphere CSI controller pod (deployed as statefulset): 

$ kubectl get pod -n kube-system 


NAME                              READY   STATUS    RESTARTS   AGE 
<snip> 
vsphere-csi-controller-0          5/5     Running   0          2m53s 

At this point, the vSphere CSI driver is created inside the etcd key/value store and the standard K8s object csidriver is now instantiated. 

$ kubectl get csidriver 

NAME                     CREATED AT 

csi.vsphere.vmware.com   2020-04-01T17:56:59Z

Step 5: Install the vSphere CSI node Daemonset 

The final step is to create the vSphere CSI node pod (one per K8s worker node). This is done by applying the following YAML file: 

$ kubectl apply -f vsphere-csi-node-ds-data.yaml 

daemonset.apps/vsphere-csi-node created

Let’s quickly check that one vSphere CSI node pod per worker node exists: 

$ kubectl get pod -n kube-system 

NAME                              READY   STATUS    RESTARTS   AGE 

vsphere-csi-controller-0          5/5     Running   0          6m37s 

vsphere-csi-node-22wbz            3/3     Running   0          35s 

vsphere-csi-node-4cs9k            3/3     Running   0          35s 

vsphere-csi-node-6asxw            3/3     Running   0          35s 

Using the vSphere CSI driver on the TKGI-based K8s cluster 

To use the vSphere CSI driver on the TKGI-based K8s cluster, create a K8s storage class definition: 


apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: cassandra-sc-csi

  annotations:

    storageclass.kubernetes.io/is-default-class: "true"

provisioner: csi.vsphere.vmware.com

parameters:

  datastoreurl: "ds:///vmfs/volumes/vsan:52d8eb4842dbf493-41523be9cd4ff7b7/"

Note that the provisioner must be set to csi.vsphere.vmware.com. The URL provided by vCenter for a specific datastore is datastoreurl. 

A screenshot of a cell phoneDescription automatically generated And that’s all! 

Simply create the PVC referencing the storage class to instantly leverage the vSphere CSI driver. All the PV will be created as First Class Disk (FCD) and stored in the directory named ‘fcd’ inside the designated datastore. 

A screenshot of a social media postDescription automatically generated

Finally, thanks to the capability of CNS, you can see all the container volumes in the vSphere web using by navigating to: 

vSphere cluster →Monitor → Cloud Native Storage → Container Volumes 

A screenshot of a cell phoneDescription automatically generated vSphere CSI driver support on TKG clusters 

The vSphere CSI driver is automatically installed on TKG thanks to cluster API, which automates the overall process. All the concepts, architecture, and implementation we have seen so far remain the same in the case of TKG. Each K8s cluster has one vsphere-csi-controller pod (deployed as StatefulSet) and one vsphere-csi-node pod (deployed as DaemonSet) per worker node. Each of the pods above has the same list of embedded containers as listed previously.  

For instance, let’s take a look at the pods and containers in the case of TKG version 1.0 on vSphere 6.7U3 (TKG workload cluster). We’ll start with the containers in the vsphere-csi-controller pod.


csi-attacher: 

    Container ID:  containerd://5315a8525d85e14120122f351892e39822c0c344b854b78bbce28fb257e410c3 

    Image: registry.tkg.vmware.run/csi/csi-attacher:v1.1.1_vmware.7 

  vsphere-csi-controller:   

    Container ID:  containerd://78e68240a46c28985b7cac3cfc3b12855e92961a7d15009f4ffcb29decc1f385 

    Image: registry.tkg.vmware.run/csi/vsphere-block-csi-driver:v1.0.2_vmware.1 

  liveness-probe: 

    Container ID:  containerd://54023d3443e6fc0a2cbd8c5aeaf1bec788c902e8c4c7aef11413df36a5f207aa 

    Image: registry.tkg.vmware.run/csi/csi-livenessprobe:v1.1.0_vmware.7 

  vsphere-syncer: 

    Container ID:  containerd://67cc0abb8fb511c3cdc1e89e5a7753f6c73b6de6f6f92b7aa348658911df5069 

    Image: registry.tkg.vmware.run/csi/volume-metadata-syncer:v1.0.2_vmware.1 

  csi-provisioner: 

    Container ID:  containerd://73fdf6f38e8c5e27d0283bd92c1726e4aeb3633605ef25332015b92acfbd8d2e 

    Image: registry.tkg.vmware.run/csi/csi-provisioner:v1.4.0_vmware.2

And here are the containers for the vsphere-csi-node pod.


node-driver-registrar: 

    Container ID:  containerd://421246337e64f4230f73e69fb68ef5e0f6c7fbf8931845da27f269f6a4f67b6e 

    Image: registry.tkg.vmware.run/csi/csi-node-driver-registrar:v1.1.0_vmware.7 

  vsphere-csi-node: 

    Container ID:  containerd://46a9031e2b7a3f6f62740ba398ec02e190fbe5526d3f7aaf652e9ac0ee5613c2 

    Image: registry.tkg.vmware.run/csi/vsphere-block-csi-driver:v1.0.2_vmware.1 

liveness-probe: 

    Container ID:  containerd://ee64af5d681b62462b1383f8e9a722c5950a2df05f370fb5bb0f47dc28df105e 

    Image: registry.tkg.vmware.run/csi/csi-livenessprobe:v1.1.0_vmware.7 

Note that TKG version 1.0 also uses vSphere CSI Driver v1.0.2. 

vSphere CSI driver-supported capabilities 

For a list of the most-up-to-date supported capabilities provided by the vSphere CSI driver, refer to the support features matrix. Here is a sample:

A screenshot of a cell phoneDescription automatically generated  

*features newly available in vSphere 7.0/CSI 2.0 

 

Useful links 

Supercharging Kubernetes Storage with CSI 

Kubernetes vSphere CSI Driver

Getting Started with VMware Cloud Native Storage  

Container Storage Interface (CSI) Driver for vSphere