vSAN Health Checks
vSAN

Replacing a vCenter server for existing vSAN hosts

Since VMware vCenter is used as a common control and management plane for a vSphere cluster, questions may arise when determining how a vSAN cluster reacts when a vCenter server must be rebuilt from a new installation, or restored from a backup. Accounting for unplanned events is always a top-of-mind concern for data center administrators. The scenario shown below helps describe how vSAN, and vCenter behaves when adding hosts previously participating in a vSAN cluster to a new, pristine vCenter server.

While vCenter plays an important role in the interactive management of a vSphere cluster, vSAN is sufficiently decoupled from vCenter to ensure continued operation if vCenter is offline, or rebuilt from a new installation. vCenter has never been responsible for any data path, or object management activities with vSAN.

vSAN 6.6 transitioned from the use of multicast to unicast for all host membership activities, and under the new architecture, maintains this in vCenter, as well as distributing the membership information across the hosts in the cluster. The vCenter authority health check is a new health check introduced to vSAN 6.6.1 to verify host membership consistency between vCenter and the hosts in the cluster. This health check will check for, and remediate any consistency issues that are seen with host membership and settings. These health checks can be especially useful in scenarios that include adding hosts to a vSAN cluster during periods in which vCenter was offline, restoring an older vCenter server from backup, and creating a new installation of a vCenter Server for an existing vSAN cluster. John Nicholson describes this nicely in the post vCenter Recoverability Improvements.

There are additional cluster health checks that can aid in the effort of rebuilding a vCenter server for an existing vSAN cluster. The vSAN health service can check for the consistency of cluster-wide settings such as deduplication and compression, encryption, and fault domains. Cluster-wide settings like these may be easily overlooked by an administrator during the process of building a new vCenter server, creating a new data center and cluster, and adding vSAN hosts previously associated with a vCenter server no longer available.

Scenario and remediation options

One or more cluster health check failures may surface when adding vSAN hosts managed by a vCenter server no longer available, into a cluster on a new vCenter server. As shown in Figure 1, the “vCenter state is authoritative” and the “vSAN cluster configuration consistency” health checks both failed. In this scenario, not only did the health check recognize that this is a new vCenter server not previously managing the hosts, it also identified that one or more cluster settings are inconsistent with the settings residing on the hosts. Looking at the message further, it states the issue is that deduplication and compression is enabled on the hosts, but not on the cluster.

Figure 1. vSAN Cluster configuration consistency

The remediation of the health check errors shown above will depend on the specific health checks that failed, the configuration of the cluster, and the steps taken by the administrator.

In this example, the options for remediation shown below describe the effective result of hosts that were in a vSAN cluster with deduplication and compression enabled, where the original vCenter server is no longer available. The hosts were added to a newly created vCenter server, where the cluster-wide setting of deduplication and compression is not enabled.

Option #1.  Ticking the deduplication and compression checkbox in vCenter. This will enable deduplication and compression as seen by vCenter, and will provide consistency between vCenter and hosts. Since the hosts were already running deduplication and compression, this action involves a small metadata update, and does not introduce any rolling disk group evacuations common with enabling or disabling deduplication and compression.  This will also eliminate the vCenter Authority health check alert, as vCenter will update the generation ID, and be identified as the source of truth after the change.

Option #2.  Temporarily removing the recently added hosts to the new vCenter server, ticking the cluster-wide, deduplication and compression checkbox in vCenter, then re-adding the hosts. This will effectively eliminate the previously generated “vSAN cluster configuration consistency” failure, and leave only the “vCenter state is authoritative” health check failure. In this scenario, remediation of the “vCenter state is authoritative” health check would be a very light weight effort, as there were no other inconsistencies with cluster-wide settings.

Option #3.  Clicking on “Remediate inconsistent configuration” in the health check UI. The current vSAN cluster-wide settings as defined in vCenter will be pushed down to all hosts participating in the vSAN cluster. In this case, this will kick off a rolling upgrade across all vSAN hosts to reflect setting of deduplication and compression NOT enabled. Any enabling or disabling of deduplication and compression on an active cluster can be a resource intensive operation, and is discouraged.

Option #4.  Clicking on “Update ESXi Configuration” in the health check UI. This is similar to option #3, where vSAN cluster-wide settings of this new vCenter server will be pushed down to all hosts participating in the vSAN cluster. There is a warning of the impact of this change, as shown in Figure 2. Depending on the settings, this could be a lightweight metadata change, such as updating the generation ID, or in this scenario, a resource intensive operation, as it would push a new deduplication and compression setting to each host in the cluster.

Figure 2. The confirmation dialog box for “Update ESXi Configuration”

In situations where the cluster wide services are consistent, then the only cluster health check alert may be the “vCenter state is authoritative” alert. This can be a light weight fix made by vCenter updating the generation ID on the hosts to reestablish consistency.

For clusters using vSAN encryption, additional steps may be necessary when replacing vCenter when vSAN encryption is enabled.  Dave Morera describes his experiences with his post Replacing vCenter with vSAN encryption enabled, and is just an example of some additional factors to consider.  Using an isolated lab to test the procedure specific to your environment is highly recommended.

Additional recommendations

Introducing a new vCenter server to an existing vSAN cluster can be made easier by adopting the following practices:

  • Run the latest version of vCenter.  After the initial installation of a new vCenter server, always run the “Check Updates” in the vCenter Appliance Management Interface, as shown in Figure 3. This will ensure that vCenter is always running the latest version, and is compatible with the version of ESXi running on the hosts.

Figure 3. Updating vCenter using the vCenter Appliance Management Interface

  • Add licensing.  Add the vSphere host, vCenter, and vSAN licenses to vCenter prior to adding the hosts to streamline the process of adding the hosts and enabling services.
  • Set and verify cluster-wide settings.  Ensure that as many cluster-wide settings are configured the same on the new vCenter Server as they were on the old vCenter server. This would include, but is not limited to Data Center and cluster object names, HA and DRS settings, as well as all vSAN cluster-wide settings.
  • Verify your protection strategies for vCenter.  Ensure that application and system level protection of your vCenter server are made per organizational requirements. This might include guest-level backups, and exporting of configuration settings such as SPBM policies, vSphere distributed switches (VDS) and other items. This can make efforts in restoring much easier.

Conclusion

Maintaining availability of a vCenter server is a desired goal for any data center powered by vSphere and vSAN. In situations where the recovery of a vCenter server is not possible, the architecture of vSAN paired with the continued improvement of the integrated health checks for vSAN in vCenter allow for an easy, predictable experience of introducing a new vCenter server to an existing vSAN cluster. For more information on this topic, see Recovering a vCenter Server and vCenter Recovery Example with vSAN on StorageHub.