vSphere 7 with Kubernetes enables operations teams to deliver both infrastructure and application services as part of the core platform. The Network service provides automation of software-defined networking to both the Kubernetes clusters embedded in vSphere and Tanzu Kubernetes clusters deployed through the Tanzu Kubernetes Grid Service for vSphere.
In Part 1 of this series, I looked at the Supervisor Cluster networking and recommend this blog post and demonstration as a prerequisite for getting the most out of Part 2. I am going to explore automated networking of the Tanzu Kubernetes cluster through the vSphere Network service—including a video walkthrough at the end of this post.
vSphere 7 with Kubernetes Services
In Part 1, I discussed the services that are enabled on the Supervisor cluster. The Tanzu Kubernetes Grid Service for vSphere will provide lifecycle management for DevOps teams wishing to provision their own Tanzu Kubernetes clusters. Not only does the vSphere Network service orchestrate the Network infrastructure to the cluster nodes using NSX, but it implements Calico as the network overlay within the cluster itself. For a technical overview of vSphere 7 with Kubernetes, check out this video.
VMware Cloud Foundation Services
Kubernetes Custom Resources
Kubernetes is not just an orchestrator of containers, but also an extensible platform that allows the definition of custom resources that can be managed through the Kubernetes API. A custom resource is an endpoint in the API that holds configuration for any object of a certain Kind. It is an extension of the API for objects that wouldn’t be in a default installation. Through the API, among other things, you can create, update, delete and get the objects. On their own, these resources don’t do anything other than let you store and retrieve information. If you want to do something with that data, you must define a controller that watches for changes in a custom resource and takes action. Example: The vSphere Virtual Machine service is made up of a set of custom resources and controllers. When a Virtualmachine resource is created in the Supervisor cluster, the virtual machine controller is responsible for reconciling that custom resource into an actual VM by calling the vCenter API.
I have gone through this description because the enablement of Tanzu Kubernetes clusters—and the associated networking—is build on the creation and reconciliation of many custom resources and their corresponding controllers. I previously created a video that describes the overall process for deploying TK clusters, along with some of the custom resources that are used.
The TK cluster is orchestrated by a set of custom resources that implement Cluster API. Cluster API is an open-source project in the Kubernetes Lifecycle SIG that manages the lifecycle of Kubernetes clusters using the Kubernetes API. That API is running in what Cluster API docs refers to as the management cluster. The management cluster in our environment is the Supervisor Cluster. The Cluster API implementation includes many custom resources. I am summarizing the capability by referring to the following three controllers, plus the NSX Container Plugin, when in fact the implementation includes many more.
Tanzu Kubernetes Cluster controller is watching for a custom resource called tanzukubernetscluster and takes the steps to create a set of custom resources that are expected by Cluster API. This resource implements the easiest way to get a Kubernetes cluster by applying a straightforward YAML specification.
CAPW controller is an abbreviation for Cluster API for Workload Control Plane (WCP) controller. WCP is how VMware engineers refer to the capability enabled through the Supervisor Cluster. The CAPW controller is the infrastructure-specific implementation of Cluster API.
VM Service controller is watching for custom objects created by CAPW and uses those specifications to create and configure the VMs that make up the TK cluster.
NSX Container Plugin (NCP) is a controller, running as a Kubernetes pod in the Supervisor cluster control plane. It watches for network resources added to etcd through the Kubernetes API and orchestrates the creation of corresponding objects in NSX.
Note that each of these controllers run as pods in the control plane of the Supervisor Cluster.
Tanzu Kubernetes Cluster Node Networking
Virtual Network Custom Resource
As the custom resources associated with the cluster nodes are being reconciled, CAPW creates a VirtualNetwork custom resource that holds the network configuration information for the cluster.
NCP is watching for that resource and will reconcile the active state of the environment with the desired state defined in this resource. That means call the NSX API and create a new network segment, Tier-1 Gateway and IP subnet for the cluster.
Virtual Network Interfaces Custom Resources
As the VM service controller is creating the Virtual Machines for the cluster, it creates a VirtualNetworkInterface resource for each of the VMs. NCP will create the interfaces on the previously created network segment and update the information in the VirtualMachineNetworkInterface resource. The VM Service Controller uses that information to configure the virtual NICs on the VMs and add the appropriate IP, MAC and gateway information.
VM Network Interfaces attached to NSX Segment and T1-Gateway
Ingress into Tanzu Kubernetes Cluster
Now that our cluster node VMs are created and have node level network access, we need to configure ingress into our cluster. The IPs that we just assigned are part of the pod CIDR that was defined at Supervisor cluster creation and are not routable from outside the cluster.
In order to get Ingress, we must create a Loadbalancer with Virtual Servers that are configured with the endpoints of the Control plane nodes. The Loadbalancer gets an IP from the Ingress CIDR also defined at Supervisor Cluster creation.
The CAPW controller creates a VirtualMachineService custom resource, and the VM Service Controller creates a Load Balancer custom resource and a Kubernetes Load Balancer Service. NCP will translate the Load Balancer custom Resource into an NSX Load Balancer and the Kubernetes Load Balancer service into the NSX virtual servers that hold the endpoint information. Those endpoints are then updated into the Kubernetes Load Balancer service.
If you are new to custom resources in Kubernetes, this is a lot of information. The video at the bottom of this blog will show you a little about how it works.
Tier-1 Gateway and Load Balancer
Overlay Networking with Calico
Now our cluster nodes have connectivity and a load balancer to allow traffic to be routed to the control plane nodes of our cluster, but there is no connectivity to pods or services defined within the cluster. Tanzu Kubernetes clusters use the Container Network Interface (CNI) as the way to connect network providers to Kubernetes networking. This is a plugin framework that allows for multiple providers. Initially, Calico is the supported CNI for TK clusters. Additional CNIs will be added in the future.
Calico runs an agent on each of the nodes in the TK cluster. The agent has two primary components; Felix and Bird. Felix is responsible for updating the routing table on the host and really anything else related to providing connectivity for pods or services on the hosts. Bird is a Border Gateway Protocol (BGP) client. It is responsible for advertising the routes updated by Felix on a particular node to all of the other nodes in the cluster.
Felix and Bird Update Tables and Advertise Routes
Pod to Pod Communication
One of the requirements for Kubernetes networking is that communication between pods in the same cluster should happen without NAT. Calico has been implemented with IP-in-IP tunneling enabled. When pods are created they get a virtual interface (Calixxxx) and an IP from a subnet assigned to the node. Routing tables are updated by Felix with the IP subnets for each Node in the cluster.
For pod communication between nodes, the traffic is routed to the Tunl0 interface and encapsulated with a new header containing the IP of the destination node. The node is also configured as a layer three gateway, and the Tunl0 traffic goes out the NSX virtual interface, and across the NSX segment assigned to the cluster. It is then routed to the appropriate node, unencapsulated at Tunl0 and finally delivered to the pod through its Calixxxx veth pair. NAT would only occur for traffic headed out of the cluster to an external network.
Pod to Pod across nodes
Providing Ingress to pods running on TK cluster
TK clusters use the Load Balancer created on cluster deployment to provide Ingress from an external network to pods running in the cluster. Users create a Kubernetes Load Balancer service on the cluster to provide Ingress.
Because a user with Namespace edit privilege has the cluster admin role on the TK cluster, it might be possible for them to access any credentials stored there. For that reason, we don’t want to access vCenter or NSX directly from the TK cluster. Activities that require access—like creating NSX virtual servers or vSphere storage volumes—are proxied to the Supervisor cluster.
This is the process for proxying resources. There is a TK cluster cloud provider running on the control plane of the TK cluster. When the Kubernetes Load Balancer service is created on the TK cluster, the cloud provider makes a call to the Kubernetes API on the supervisor cluster to create a VirtualMachineService custom resource. As described previously, the VirtualMachineService is reconciled into a new Kubernetes Load Balancer Service on the Supervisor Cluster. The NCP then reconciles that service into the NSX Virtual Server and endpoints needed to access the service. This results in the user accessing the service through a new IP on the original cluster Load Balancer.
Let’s see it in action
That was a lot to retain in a single blog post. I look at this in more detail in the video. For more information on vSphere 7 with Kubernetes, check out our product page: https://www.vmware.com/products/vsphere.html