vSAN Operations: Use separate SPBM policies for VMs in stretched clusters

vSAN stretched clusters are an easy, fast, and flexible way to deliver cluster level redundancy across sites using a capability built right into vSphere. Since it is enabled at the cluster level, a mix of stretched clusters and non-stretched clusters can easily co-exist and all be managed by the same vCenter server.

This flexibility can lead to operational decisions in the management of SPBM policies: The rules that govern the performance and protection requirements for your VMs. vSAN stretched clusters have a few policy rules that adopt a slightly different behavior when running in stretched cluster environments. Therefore, creating and using separate, purpose-built storage policies specifically for VMs in stretched clusters is recommended for single, and multi-cluster environments. Let’s go into more detail about this recommendation.

Two storage policy rules will be the focus of this post as we look at differences between non-stretched vSAN clusters and stretched vSAN clusters.

“Failure Tolerance Method” (FTM): Defines the actual data placement, or parity method used to tolerate a failure. The FTM can be set to “RAID-1 (Mirroring)” or “RAID-5/6 (Erasure Coding).”
“Failures to Tolerate” (FTT): Defines the number of failures an object can tolerate while still being accessible. Valid preset FTT values for RAID-1 object mirroring would be from 0 – 3, while RAID-5/6 supports an FTT of 1 – 2.

In any type of vSAN environment, the FTM description of “RAID-5/6 (Erasure Coding)” refers to two RAID levels. The actual RAID scheme used under this policy setting are dictated by the associated FTT level assigned in the policy. A policy setting of FTT=1 will mean that assigned VMs will use RAID-5, while an FTT=2 means that assigned VMs will use RAID-6. For the purposes of clarity, the SPBM policy names used in this post are to help explain their settings. In practice, they can be named whatever suites an environment best.

Adjusted definition for vSAN 6.6 and newer

vSAN 6.6 introduced the ability to assign an additional, secondary layer of protection when spanning a mirrored copy of data across the two sites of stretched cluster. The ability for secondary “local protection” was introduced, but needed to fit within the existing policy structure. In vSAN 6.6, there are two types of a level of failure to tolerate. There is now a “Primary Level of Failures to Tolerate” (PFTT), and a “Secondary Levels of Failure to Tolerate” (SFTT). The location they apply at in the topology depends on whether or not the stretched cluster feature is enabled on a specific vSAN cluster.

With a non-stretched cluster in vSAN 6.6, the Failures to Tolerate, or FTT is now called “Primary Level of Failures to Tolerate” or PFTT. This defines the number of failures to tolerate within a cluster at a single, local site. Just as described earlier, the PFTT in this case could have a setting 0 – 3 depending on the circumstances and FTM chosen. The options available for an FTM of a non-stretched cluster are RAID-1, and RAID-5/6. Figure 1 shows how the FTM and the PFTT are represented in a non-stretched cluster.

Figure 1. How assigned FTM and PFTT policy rules look in a non-stretched cluster

With a stretched cluster, the PFTT definition is different. The PFTT defines the number of failures to tolerate across the two sites, with a valid setting of 0, or 1. A setting of 0 (and paired with an “affinity” rule) is a way of setting site affinity, and would mean that it would not be protected across sites. This is a useful setting for VMs that may already have availability mechanisms at the application layer, or do not need cross-site availability.

A “Secondary Level of Failures to Tolerate” or SFTT, defines the number of failures it can tolerate within each local site. Valid preset SFTT values of for RAID-1 object mirroring would be from 0-3, while RAID-5/6 supports an FTT of 1-2. The FTM chosen in the policy setting remains as the way to define the data placement method (mirroring, versus erasure coding) used to tolerate a failure. Figure 2 shows how the PFTT and SFTT are represented in a stretched cluster.

Figure 2. How assigned FTM, PFTT and SFTT policy rules look in a stretched cluster

In a stretched cluster environment, the implied FTM across sites is always a RAID-1 mirror. The FTM setting in the policy definition can be set to either “RAID-1 (Mirroring)” or “RAID-5/6 (Erasure Coding).” In a stretched cluster, the FTM rule determines the RAID scheme used for the secondary local protection (SFTT), when it is defined.

Behaviors when enabling or disabling stretched clusters without explicitly defined SPBM policies

How does vSAN handle VMs with non-stretched cluster specific storage policies when transitioning from a non-stretched cluster to a stretched cluster? Take a look at Figure 3.

Figure 3. Behavior of non-stretched cluster policies when transitioning to a stretched cluster

Once stretched clustering is enabled and configured in a vSAN cluster, the VMs do not impart any secondary, local protection logic for data placement. The PFTT that was designated at the local site prior to enabling a stretched cluster is now set across sites. As a result, the policy may not be able to provide compliance depending on the original policy setting.

Let’s look at the behavior in the other direction, where we have VMs with stretched cluster specific storage policies, and transition from a stretched to a non-stretched cluster. Figure 4 shows this behavior.

Figure 4. Behavior of stretched cluster policies when transitioning to a non-stretched cluster

When disabling a stretched cluster with policies built for stretched clusters, you may find some artifacts from the data placement and arrangement of the objects. vSAN is smart enough to clean this up, and will do so when you apply policies that do not have stretched cluster specific rules in them.

Now let’s look at what happens when we attempt to change a VM’s policy to a stretched cluster specific storage policy, even though the cluster does not have stretched clustering configured. Figure 5 details this behavior.

Figure 5. Attempting to apply stretched cluster specific policies when cluster is not stretched

In this case, we are attempting to change a VM’s policy to one specifically designed for a stretched cluster (perhaps the policy was built when stretched clustering had been enabled at in a previous scenario), while the cluster is not currently configured for stretched clusters. vSAN will not allow this. Furthermore, a policy that has these rules included in a policy will not be visible in the UI when a stretched cluster has been disabled. In vSAN 6.6 and 6.6.1, the ability to set or configure the SFTT rule or any other rule specific (e.g. “affinity” rule) to stretched clusters in a policy cannot be performed until stretched clustering is enabled.

SPBM policy recommendations for stretched clusters

The easiest way to accommodate a mix of stretched, and non-stretched vSAN clusters is to have separate policies for stretched clusters. You could have policies that are exclusive to that specific vSAN stretched cluster, or build stretched cluster specific policies that could be applied to multiple stretched clusters. Based on the topology, a blend of both strategies might be most fitting for your environment. Perhaps cluster specific policies for larger purpose-built clusters, along with a single set of policies for all smaller branch offices. Additional policies can easily be created by cloning existing SPBM policies, modifying accordingly, then assigning to the appropriate VMs. Having multiple policies for VMs in stretched and non-stretched clusters is also good for a single cluster environment where you need to tear down and recreate the stretched cluster.

Adjusting existing policies will always impact all VMs that are using the adjusted policy, whether they live in a stretched cluster, or non-stretched cluster. Adjustments in this scenario could introduce unnecessary resynchronization traffic when an administrator is trying to remediate an unexpected policy condition. This is another reason why dedicated SPBM policies for VMs running in stretched clusters are recommended.

Summary

vSAN stretched clusters use SPBM to provide extraordinary levels of flexibility and granularity for any vSAN environment, and is one of the staples behind vSAN’s ease of use. Using separate polices for VMs in stretched clusters is a simple operational practice that can help virtualization administrators become more comfortable with introducing and managing one or more stretched clusters in a vSAN powered environment.

Adjusted definition for vSAN 6.6 and newer

Behaviors when enabling or disabling stretched clusters without explicitly defined SPBM policies

SPBM policy recommendations for stretched clusters

Summary

Related Articles

Wrap up of VMware vSAN at VMware Explore 2023

Tech Zone Blog Updates Highlight: Exploring the Latest Innovation in VMware vSAN

Top Ten VMware Explore Storage Sessions