In this post, I am going to take Tanzu Kubernetes Grid, VMware’s Cluster-API based Kubernetes solution for on-prem and multi-cloud environments, for a test drive.
Tanzu Kubernetes Grid has come a long way since its first release in April 2020. The latest version, 1.3, adds a lot of new features and improvements, many of which I will discuss in detail.
Release dates and feature list
With version 1.3, Tanzu Kubernetes Grid is on its sixth iteration.
Take a look at this overwhelming list of new Tanzu Kubernetes Grid 1.3 features first (I will be focusing on the topics in bold).
- Updated to Kubernetes 1.20.4
- Ubuntu 20.04 node images distributed for all supported infrastructure providers
- Tanzu CLI replaces Tanzu Kubernetes Grid CLI
- Ability to update vCenter credentials post-creation
- K8s Service of type LoadBalancer with NSX Advanced Load-Balancer Essentials for vSphere
- Pinniped/Dex for OIDC/LDAPS integration
- image-builder and dependencies packaged as a container
- Create your own Photon OS, Ubuntu 20.4, Amazon Linux 2, RHEL7 images
- Automatic installation of core add-ons with new add-ons management
- Includes Antrea, Dex, Pinniped, (vSphere) CPI, (vSphere) CSI, metrics-server
- New clusters support automatic upgrades for core add-ons
- HTTP/S proxy support
- Disaster recovery of workload clusters using Velero
- Register management cluster in Tanzu Mission Control
- Audit logging enabled
- Kubernetes API server audit logs
- Virtual machine (VM)-level audit logs (via
auditd
) - Audit logs can be forwarded via the Fluent Bit log forwarder extension
- Fluent Bit Syslog output plugin (enables forwarding logs to vRLI on-prem)
- Metrics Server installed by default on management and workload clusters
- Enables
kubectl top nodes
- Enables
kubectl top pods
- Enables
- New CRD TanzuKubernetesRelease
- external-dns as an in-cluster extension
More details and a quick product snapshot can be found in the Release Notes.
Tanzu CLI
A major change in this release is the introduction of the Tanzu command-line interface (CLI), which will unify operations across solutions such as Tanzu Kubernetes Grid and Tanzu Mission Control. Unpacking the Tanzu Kubernetes Grid tar file creates the cli
directory, which includes the Tanzu CLI binary, Tanzu Kubernetes Grid plugins, and other executables (i.e., kapp
, imgpkg
, kbld
, ytt
).
Let’s start by installing the Tanzu CLI and plugins on a Ubuntu 20.04 Linux system using the following commands:
$ tar xvf tanzu-cli-bundle-linux-amd64.tar
$ sudo mv cli/core/v1.3.0/tanzu-core-linux_amd64 /usr/local/bin/tanzu
$ tanzu plugin install --local cli all
$ ls -al ~/.local/share/tanzu-cli/
$ source <(tanzu completion bash)
$ tanzu plugin list
It is highly recommended to add the Tanzu completion command to your ~/.bashrc
or ~/.bash_profile file
for a better user experience. The new CLI command tanzu
is using a NOUN VERB
syntax and is organized around the plugins, but otherwise it is very similar to the old Tanzu Kubernetes Grid CLI. The documentation includes a very exhaustive command reference and comparison table.
As a first step, we will create a Tanzu Kubernetes Grid management cluster. The following sample command will create the cluster on a Linux jump box (with IP address 192.168.110.100
) and expose the configuration UI on port 9080
:
$ tanzu management-cluster create --ui --bind 192.168.110.100:9080 --browser none
The configuration workflow is the same as with older Tanzu Kubernetes Grid versions, you just get some additional configuration options. And in cases where you’ve already configured a management cluster, you can now restore the cached data.
NSX Advanced Load Balancer integration
The NSX Advanced Load Balancer (ALB), VMware’s enterprise-grade, software-defined load balancing and web application firewall (WAF) solution, can be used for L4 load balancing in Tanzu Kubernetes Grid workload clusters (i.e., you can create Kubernetes services of type LoadBalancer
). To enable this integration, you have to configure the controller IP/DNS name, username, and password first. Once the connection is verified, you can choose the vSphere Cloud you configured in the controller and a Service Engine (SE) group in that cloud. Next, enter the VIP network name and CIDR you configured in the Infrastructure -> Networks section of the NSX ALB.
Since the default self-signed certificate of the controller is not using a Subject Alternative Name (SAN), you have to replace it before you can add it in the Tanzu Kubernetes Grid configuration UI. This can easily be done with the NSX ALB web interface. Go to Templates -> Security -> SSL/TLS Certificates and choose Create Controller Certificate. After you create the new certificate, click the download icon, copy the certificate from the export dialog, and paste it in the Controller Certificate Authority box of the Tanzu Kubernetes Grid GUI.
Last but not least, you have to tell the NSX ALB to use this new certificate. Go to Administration -> Settings -> Access Settings, edit System Access Settings, delete all configured (default) SSL/TLS certificates, and add the custom certificate you created above. Log in to the controller again and check that the new certificate is active.
There is one last piece of information missing in the NSX ALB configuration: the cluster labels. If you do not add any, all Tanzu Kubernetes Grid workload clusters will use the NSX ALB. If you add labels, only clusters labelled correctly, for example with
$ kubectl label cluster tkg-cluster-1 team=tkg-networking
will use it. The Avi Kubernetes Operator (AKO) on the workload clusters
$ kubectl -n avi-system get pods
NAME READY STATUS RESTARTS AGE
ako-0 1/1 Running 0 3d18h
translates Kubernetes resources (e.g., services of type LoadBalancer
) into virtual services on the NSX ALB via its REST API. It is installed and configured by the AKO Operator (a meta-controller).
$ kubectl config use-context avi-mgmt-admin@avi-mgmt
$ kubectl -n tkg-system-networking get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
ako-operator-controller-manager 1/1 1 1 3d20h
The AKO Operator
-
runs on the management cluster
-
orchestrates the installation of AKO in a selective group of workload clusters
-
manages user credentials for AKO
-
instructs AKO to clean up resources when a workload cluster is being deleted
The custom resource definition (CRD) reconciled by the AKO operator, AKODeploymentConfig
, has a label selector defined, which is used to match workload clusters. When a workload cluster has labels defined that can be matched by an AKODeploymentConfig
, it will have the NSX ALB enabled. And last but not least, when a workload cluster's labels match multiple AKODeploymentConfig
, only the first one takes effect.
Tanzu Kubernetes Grid configures one AKODeploymentConfig
, called install-ako-for-all
, so you can only use one SE group of the NSX ALB by default. And with a Tanzu Advanced or NSX ALB enterprise license, you can go one step further and create multiple instances of AKODeploymentConfig
using other SE groups and labels. This allows you to connect different workload clusters to specific SE groups. You can even enable L7 Ingress, but that’s a different story.
The following commands show the default AKODeploymentConfig
configuration without any label settings.
$ kubectl get akodeploymentconfig
NAME AGE
install-ako-for-all 3d23h
$ kubectl get akodeploymentconfig install-ako-for-all -o yaml | tail -n 21
spec:
adminCredentialRef:
name: avi-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: avi-controller-ca
namespace: tkg-system-networking
cloudName: Default-Cloud
controller: avi-controller.corp.local
dataNetwork:
cidr: 192.168.100.0/24
name: VM-RegionA01-vDS-COMP
extraConfigs:
image
pullPolicy: IfNotPresent
repository: projects.registry.vmware.com/tkg/ako
version: v1.3.2_vmware.1
ingress:
defaultIngressController: false
disableIngressClass: true
serviceEngineGroup: Default-Group
$ kubectl get akodeploymentconfig -o json | jq '.items[].spec.clusterSelector'
null
With labels, the last output looks different:
$ kubectl get akodeploymentconfig -o json | jq '.items[].spec.clusterSelector'
{
"matchLabels": {
"team": "tkg-networking"
}
}
Let’s check if we can create a service of type LoadBalancer
using the following YAML file:
$ cat tanzu/lb.yaml
apiVersion: v1
kind: Service
metadata:
name: lb-svc
spec:
selector:
app: lb-svc
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: lb-svc
spec: replicas: 2
selector:
matchLabels:
app: lb-svc
template:
metadata:
labels:
app: lb-svc
spec:
serviceAccountName: default
containers:
- name: nginx
image: gcr.io/kubernetes-development-244305/nginx:latest
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
lb-svc-64f9549557-446zx 1/1 Running 0 10s
lb-svc-64f9549557-fnfpr 1/1 Running 0 10s
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 100.64.0.1 443/TCP 1d1h
lb-svc LoadBalancer 100.65.50.94 192.168.100.55 80:32100/TCP 15s
In the NSX ALB UI, we can see the new virtual service, as expected.
More networking features
Other important networking improvements of Tanzu Kubernetes Grid 1.3, which I noted in the introduction, are:
-
HTTP/S proxy configuration
-
Experimental support for routed pods (NO-NAT)
-
external-dns as an in-cluster Tanzu Kubernetes Grid extension
The configuration of the HTTP/S proxy settings in the UI is straightforward, just be careful with the NO PROXY entries.
NSX-T is a mandatory requirement for the implementation of the experimental feature of routed (NO_NAT) pods. All you have to do is adding a few variables to the workload cluster definition:
SERVICE_DOMAIN: "corp.tanzu"
CLUSTER_CIDR: “100.96.0.0/11”
NSXT_MANAGER_HOST: "192.168.110.49"
NSXT_USERNAME: "admin"
NSXT_PASSWORD: "VMware1!VMware1!"
NSXT_POD_ROUTING_ENABLED: "true"
NSXT_ROUTER_PATH: "/infra/tier-1s/t1-tkg"
NSXT_ALLOW_UNVERIFIED_SSL: "true"
IMPORTANT: With Tanzu Kubernetes Grid 1.3 you should define each cluster in a separate YAML file and create it with a command like:
$ tanzu cluster create tkg-cluster-1 -f tkg-cluster-1.yaml -v 6
For a successful routed pod implementation, it is mandatory to advertise All Static Routes on the T1 gateway
and also on the T0 gateway, if you are using, for example, BGP (which is the case in my setup).
Why is that? In the configuration above, we used the CLUSTER_CIDR 100.96.0.0/11
(this is the routable pod network); each cluster VM gets a /24
chunk of it (e.g., 100.96.1.0/24
). The host IP of each VM is configured as next hop for the network range it owns, as you can see in the Static Routes configuration of the T1 gateway. In addition to that, Antrea is configured in the NoEncap mode when you enable routed pods (i.e., it assumes the node network can handle routing of pod traffic across nodes).
Let’s check if our configuration was deployed successfully.
$ kubectl -n kube-system get pod metrics-server-7c6765f9df-f8jcg -o json | jq '.status.podIP'
"100.96.1.2"
$ ping -c 3 100.96.1.2
PING 100.96.1.2 (100.96.1.2) 56(84) bytes of data.
64 bytes from 100.96.1.2: icmp_seq=1 ttl=60 time=2.47 ms
64 bytes from 100.96.1.2: icmp_seq=2 ttl=60 time=2.17 ms
64 bytes from 100.96.1.2: icmp_seq=3 ttl=60 time=1.64 ms
--- 100.96.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 1.637/2.092/2.470/0.344 ms
To retrieve and view all of the available NSX-T configuration options, you can use the following command:
$ grep NSXT ~/.tanzu/tkg/providers/config_default.yaml
NSXT_POD_ROUTING_ENABLED: false
NSXT_ROUTER_PATH: ""
NSXT_USERNAME: ""
NSXT_PASSWORD: ""
NSXT_MANAGER_HOST: ""
NSXT_ALLOW_UNVERIFIED_SSL: "false"
NSXT_REMOTE_AUTH: "false"
NSXT_VMC_ACCESS_TOKEN: ""
NSXT_VMC_AUTH_HOST: ""
NSXT_CLIENT_CERT_KEY_DATA: ""
NSXT_CLIENT_CERT_DATA: ""
NSXT_ROOT_CA_DATA_B64: ""
NSXT_SECRET_NAME: "cloud-provider-vsphere-nsxt-credentials"
NSXT_SECRET_NAMESPACE: "kube-system"
Authentication & authorization
The next major improvement is the OIDC/LDAPS configuration in the UI, which is using Pinniped/Dex as the back end. Similar to AVI, the deployment of the whole setup is managed by components running on the management cluster, so you don’t have to worry about configuring it each time for the workload clusters. The following output shows a simple Active Directory configuration for a Tanzu CLI-based deployment.
$ grep LDAP tkg-mgmt-cluster.yaml
LDAP_BIND_DN: cn=Administrator,cn=Users,dc=corp,dc=tanzu
LDAP_BIND_PASSWORD: <encoded:Vk13YXJlMSE=>
LDAP_GROUP_SEARCH_BASE_DN: cn=Users,dc=corp,dc=tanzu
LDAP_GROUP_SEARCH_FILTER: (objectClass=group)
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: member
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: controlcenter.corp.tanzu:636
LDAP_ROOT_CA_DATA_B64: XXXXXXXXX
LDAP_USER_SEARCH_BASE_DN: cn=Users,dc=corp,dc=tanzu
LDAP_USER_SEARCH_FILTER: (objectClass=person)
LDAP_USER_SEARCH_NAME_ATTRIBUTE: sAMAccountName
LDAP_USER_SEARCH_USERNAME: sAMAccountName
If you want to know all configuration options, search for LDAP or OIDC in the Tanzu config_default.yaml
file as usual.
IMPORTANT: Do not forget to provide the callback URI to the OIDC provider. Also, LDAPS will not work with older TLS versions; you need at least TLS 1.2. Check the official Tanzu Kubernetes Grid 1.3 documentation for additional information.
When everything is set up correctly, a developer just needs to install the Tanzu CLI and connect to the Kubernetes API server to get the correct kubeconfig
.
$ tanzu login --endpoint https://172.31.0.49:6443 --name tkg-mgmt-cluster
A browser window will now be opened by the Tanzu CLI to log into the developer account.
After completing this process, Kubernetes resources can be accessed/created according to the role-based access control rules the cluster administrator has configured. A Tanzu Kubernetes Grid 1.3 cluster has approximately 80 roles available to choose from.
$ kubectl create clusterrolebinding test-rb --clusterrole edit --user test
This is the end, my friend
To wrap up, here is a command that shows the kind of information you get from the metrics-server
, which is installed by default on the management and workload clusters.
$ kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
antrea-controller-577bf7c894-8ldtv 12m 38Mi
etcd-nsxt-cluster-1-control-plane-rgg2j 100m 62Mi
kube-apiserver-nsxt-cluster-1-control-plane-rgg2j 205m 408Mi
kube-controller-manager-nsxt-cluster-1-control-plane-rgg2j 33m 68Mi
kube-proxy-6lw8k 1m 23Mi
And finally, some commands that explain how to use the new TanzuKubernetesRelease
CRD.
$ tanzu kubernetes-release get
NAME VERSION COMPATIBLE UPGRADEAVAILABLE
v1.17.16---vmware.2-tkg.1 v1.17.16+vmware.2-tkg.1 True True
v1.18.16---vmware.1-tkg.1 v1.18.16+vmware.1-tkg.1 True True
v1.19.8---vmware.1-tkg.1 v1.19.8+vmware.1-tkg.1 True True
v1.20.4---vmware.1-tkg.1 v1.20.4+vmware.1-tkg.1 True False
$ tanzu kubernetes-release available-upgrades get v1.19.8---vmware.1-tkg.1
NAME VERSION
v1.20.4---vmware.1-tkg.1 v1.20.4+vmware.1-tkg.1
$ tanzu cluster upgrade tkg-cluster-1
$ tanzu cluster upgrade tkg-cluster-1 --tkr v1.20.1---vmware.1-tkg.2
$ tanzu cluster upgrade tkg-cluster-1 --os-name ubuntu --os-version 20.04
The meaning of the version string v1.17.16+vmware.2-tkg.1
used for installations or updates is the following:
-
v1.17.16
– upstream version of Kubernetes used -
vmware.1
– compiled & signed binaries by VMware -
tkg.1
– Tanzu Kubernetes Grid software added on top of it
That’s all for today, but you’ll most likely agree that Tanzu Kubernetes Grid 1.3 is a pretty amazing release.
This article may contain hyperlinks to non-VMware websites that are created and maintained by third parties who are solely responsible for the content on such websites.