When using vSAN Encryption, one of the vSAN Health Check tests will show the health of the connection between the vSAN Hosts and the KMS Cluster as well as vCenter and the KMS Cluster.
One scenario came up a few weeks ago where the vSAN Health Check indicated that the vSAN Hosts could properly communicate with the KMS Cluster, but the vCenter server had intermittent connectivity to the KMS Cluster.
Troubleshooting indicated that there were no blocked ports between the vCenter Server and the KMS Cluster as well as they were able to properly ping each other. vSAN Hosts could properly ping the KMS Cluster as well, and no ports were blocked.
Here is the vSAN Health Check’s reported error for the vCenter KMS Status.
Notice that the certificate status is valid, but the connection and trust statuses are not.
Looking at the Host KMS Status it can be seen that the hosts are properly communicating with the KMS Server.
The process of enabling vSAN Encryption includes the following steps:
- A KMS Connection Profile is created in vCenter and the trust is established.
- vSAN Encryption is enabled in the Configuration>Data Services menu in the vSAN UI.
- The KMS Connection Profile is pushed to each of the ESXi hosts, they use the kekId and hostkeyId in this profile to retrieve the KEK and HostKey for the vSAN Cluster.
The connection has to be correct in vCenter Server before it can be correct/pushed to vSAN Hosts. Something must have changed in the environment to cause this issue.
Further investigation indicated that the connectivity to the KMS Cluster was intermittent. Sometimes the vCenter KMS Status reported green and other times reported red. So maybe nothing changed.
Careful review of the vCenter KMS Status and Host KMS Status health checks, the KMS Alias is a “short name”.
Maybe there is an issue where the short name is intermittently resolved from DNS… But the vSAN Hosts were not showing any intermittent connectivity, only the VCSA.
The Key Management Servers configuration Profile in the vCenter’s settings shows that the trust cannot be established. The KMS Address is the same value as the KMS Alias in the vSAN Health Check.
When using a short name, the default TCP/IP stack of a vSAN host uses designated search domains in the name resolution process. In the case of this cluster, demo.local and demo.central can be used in short name resolution.
The VCSA, on the other hand, does not have any search domains:
Without search domains to assist with the short name, vCenter would rely on the DNS server for name resolution.
The suggestion was made to change the KMS Address value for each KMS Cluster node to either an IP address or the Fully Qualified Domain Name (FQDN). Changing one of the two KMS entries showed some success.
Adjusting the KMS Address for the alternate KMS Cluster node cleared the issue up entirely.
In the case that this was brought up, an alternate vCenter had no issues connecting to the KMS Cluster, but an IP address was used instead of a short name. Without digging into DNS configurations of the environment, setting the Fully Qualified Domain Name (FQDN) resolved the issue.
In short, when configuring the Key Management Server connection profile for a KMS Cluster, ensure that the KMS Address is one that vCenter and vSAN hosts can correctly resolve. Using a Fully Qualified Domain Name or IP address can prevent “short name” related issues.