vSAN Strategy

Multi-Cluster Upgrading Strategies for vSAN

While VMware continues to introduce all new levels of performance, capabilities, robustness, and ease-of-use to vSAN, the respective vSAN clusters must be updated to benefit from these improvements. While the upgrading process continues to be streamlined, environments running multiple clusters of vSAN can benefit from specific practices that will deliver a more efficient upgrade experience.

vCenter Server Compatibility

In a multi-cluster environment, vCenter server must be running the version equal to, or greater than the version to be installed on any of the hosts for the clusters it manages. Ensuring that vCenter server is always running the very latest edition will guarantee compatibility among all potential host versions running in a multi-cluster arrangement, and introduce enhancements to vCenter that independent from the clusters it is managing.

Recommendation: Periodically check that vCenter is running the latest edition. The vCenter Server Appliance Management Interface (VAMI) can be accessed using https://vCenterFQDN:5480

Phasing in new versions of vSAN

vSAN is a cluster-based storage solution. It only uses the hosts in that specific cluster to provide storage and data services to the VMs running in that cluster. Therefore, upgrades should be approached on a per-cluster basis, not a per-host basis. With multi-cluster environments, IT teams can use a strategy of phasing in a new version of vSAN on a per-cluster basis to meet any of their own vetting, documentation, and change control practices. Similar to common practices in application maintenance, upgrades can be phased in on less-critical clusters for testing and validation prior to rolling out the upgrade into more critical clusters.

Figure 1. Phasing in new versions of vSAN on a per-cluster basis

VUM also respects this per-cluster upgrade approach as well. In vSAN 6.6.1 and newer, it will coordinate the release catalog and the HCL against what it detects in the environment and will upgrade to the latest edition of the hypervisor that is possible.

Procedures for updating clusters should not be just limited to hypervisor upgrades, but also include recommended firmware and driver updates for storage controllers, NICs, and BIOS versions. While vSAN has the ability to update firmware on some storage controllers, the procedure for updating other discrete components may be a manual process, and unique to the hardware being used in that cluster.

Recommendation: When updating is performed, update to the very latest version of vSAN available. If a cluster is several versions behind, there is no need to update the versions incrementally to get to the latest. The latest edition will have more testing and typically brings a greater level of intelligence to the product, and the conditions that it is running in.

Parallel Upgrades

While vSAN limits the upgrade process to one host at a time within a vSAN cluster, cluster upgrades can be performed concurrently if desired. This can speed up the completion time of updating hosts across larger data centers. The decision to update one cluster at a time versus multiple clusters at a time will be at the discretion of the organization based on understanding the tradeoffs, and procedural limitations within the organization.

If there is a desire to update more hosts simultaneously, this should be factored into the vSAN cluster sizing strategy. More clusters with a fewer number of hosts will allow for more parallel remediation than fewer clusters with a larger number of hosts. For example, an environment with 280 hosts would be able to cut remediation time by half if the cluster design called for 20 clusters of 14 hosts per cluster, as opposed to 10 clusters of 28 hosts per cluster. Since a vSAN cluster is its own discrete storage system serving the needs of its own cluster, administrators may find greater agility in operations and troubleshooting. The decision process of host counts and cluster sizing is discussed in great detail at: vSAN Cluster Design – Large Clusters Versus Small Clusters

Larger environments with multiple vSAN clusters may be more likely to have different generations of hardware. Since drivers and firmware can be potential sources of issues during an update process, concurrent cluster upgrades may introduce operational challenges to those managing and troubleshooting updates. One may find that depending on the age and type of hardware, introducing a new version of vSAN could be deployed as a pilot effort to a few clusters individually. Then, based on the results, could be introduced to a larger number of clusters simultaneously. As with any type of update in the data center, ensure that due diligence is practiced on determining what level of simultaneous updates is considered acceptable for your own organization.

Recommendation: For cluster updates, focus on the efficient delivery of services while the cluster is being updated, as opposed to the speed in which the cluster is updated. vSAN restricts parallel host remediation. A well designed and operating cluster will seamlessly roll through updating all of the hosts in the cluster without interfering with expected service levels. In other words, the speed at which the cluster is updated is less important than the efficient delivery of resources to the VMs powered by the cluster.

Why are vSAN clusters restricted to updating one host at a time? Limiting host updates to a single host per cluster helps reduce the complexity of subtracting not only compute resources from a cluster, but storage capacity and storage performance from a cluster. Factoring in available capacity in addition to compute resources is unique to an HCI architecture. Total available host count can also become important for some data placement policies such as an FTT=3 using mirroring, or an FTT=2 using RAID-6 erasure coding. Limiting the update process to one host at a time per cluster also helps avoid this complexity, while reducing the potential need for data movement due to resynchronizations.

Summary

For data centers, the availability of services generally takes precedence over everything else. Environments consisting of multiple vSAN clusters can take advantage of it’s unique, modular topology by phasing in upgrades on a per-cluster basis to the hypervisor, as well as any dependent hardware updates including storage controllers, NICs, and BIOS versions.

@vmpete