Technical VCF Storage (vSAN)

Auto-Policy Management Capabilities with the ESA in vSAN 8 U1

Much like vSAN 8, the Express Storage Architecture (ESA) in vSAN 8 U1 introduces capabilities not possible with our original storage architecture (OSA). Our new “Auto-Policy” Management feature is an optimized, cluster-specific default storage policy that will help administrators run workloads on an ESA cluster using the optimal level of resilience and efficiency. Let’s look in more detail at what this feature does, and why we created it.

Background

Data housed in a vSAN datastore is always stored in accordance with an assigned storage policy, which prescribes a level of data resilience and other settings. The assigned storage policy could be manually created, or a default storage policy created by vSAN. Past versions of vSAN used a single “vSAN Default Storage Policy” stored on the managing vCenter Server to serve as the policy to use if another policy wasn’t defined and applied by an administrator. Since this single policy was set as the default policy for all vSAN clusters managed by the vCenter Server, it used settings such as a failures to tolerate of 1 (FTT=1) using simple RAID-1 mirroring to be as compatible as possible with the size and the capabilities of the cluster.

This meant that the default storage policy wasn’t always optimally configured for a given cluster. The types, sizes, and other characteristics of a cluster might be very different. Changing a policy rule optimized for one cluster may not be ideal, or even compatible with another cluster. We wanted to address this, especially since the ESA eliminates compromises in performance between RAID-1 and RAID-5/6.

Auto-Policy Management for ESA

vSAN 8 U1 introduces a way in the ESA to ensure data using a default storage policy is stored in the most optimal, resilient way. The cluster service can be enabled by highlighting the cluster, clicking on Configure > vSAN Services > Storage, then clicking on “Edit” and enabling “Auto-Policy management.”

Figure 1. Enabling Auto-Policy management.

When enabled, a new cluster-specific default storage policy will be created on the managing vCenter Server. This is created for a specific cluster and prevents imparting sub-optimal or incompatible settings for other vSAN clusters. It will create a policy using a syntax similar to: Cluster Name -Optimal Datastore Default Policy – RAIDx

The cluster where Auto-Policy Management is enabled will change the policy assigned in the “Default Storage Policy” from “vSAN Default Storage Policy” to the new cluster-specific default storage policy. This setting can be found by clicking on the datastores icon, highlighting the vSAN datastore, selecting the clicking Configure > General, then viewing “Default Storage Policy.

Figure 2. Default Storage Policy used for cluster adjusted to new dynamic policy.

At this point, all newly created VMs created using the “Datastore Default” will use the new cluster-optimized default storage policy when a user is creating a VM and “Datastore Default” is selected as the storage policy.

Figure 3. Provisioning a new VM using the “Datastore Default” for the selected vSAN datastore.

Using Auto-Policy Management in an Environment Running ESA

With any type of configuration change, we recognize that many of our customers want to take a careful approach to automatic changes to a default storage policy. We also recognize that sometimes cluster changes may be temporary. So we’ve made this feature easy to introduce and operationalize.

First, the Auto-Policy Management feature is disabled by default on each ESA cluster, so that customers can become familiar with its operation by phasing it in on a per-cluster basis. When enabled, all VMs using the policy named “vSAN Default Storage Policy” or any other manually created storage policy will continue to do so. They will not be changed unless the administrator manually selects the new policy for that VM. To be clear, the Auto-Policy Management feature does NOT change any user-defined storage policies, or the policy named “vSAN Default Storage Policy” on the managing vCenter Server.

When the feature is enabled, it will monitor the host count of the cluster to detect if there are any additions or removals. Hosts in maintenance mode do not change the cluster host count. A new “health finding” (renamed from “health check” in previous versions of vSAN and vSphere) will be available in Skyline Health for vSAN called “vSAN optimal datastore default policy configuration.” It will show the current health as well as the health history of this cluster-specific, optimized default storage policy. If vSAN detects that the default storage policy created by the Auto-Policy Management feature is not set ideally for a given vSAN ESA cluster, it will trigger a health finding alert.

Figure 4. Skyline Health finding in an unhealthy state

By clicking on “Troubleshoot” in the health finding, it will show the current value of the storage policy and the new suggested value of the storage policy. vSAN will not automatically change this storage policy. It is up to the user to make the change. This is our initial approach to help administrators have the appropriate guidance on the best default storage policy settings for their cluster while feeling confident that the default storage policy is not changing without their knowledge.

Figure 5. Guidance suggests the recommended new storage policy setting for the cluster-specific, default storage policy.

Once the configuration of the storage policy is adjusted by the administrator, the health finding will regain its healthy state, and any objects using that policy will be reconfigured to the new settings.

Recommendation: When making the recommended changes to the storage policy, you may also want to adjust the name to keep it consistent with the RAID configuration used.

Configuration Logic for Optimized Storage Policy for Cluster

The policy settings the optimized storage policy uses are based on the type of cluster, the number of hosts in a cluster, and if the Host Rebuild Reserve (HRR) capacity management feature is enabled on the cluster. A change to any one of the three will result in vSAN making a suggested adjustment to the cluster-specific, optimized storage policy. Note that the Auto-Policy Management feature is currently not supported when using the vSAN Fault Domains feature.

  • Standard vSAN clusters (with Host Rebuild Reserve turned off):
    • 3 hosts without HRR : FTT=1 using RAID-1
    • 4 hosts without HRR: FTT=1 using RAID-5 (2+1)
    • 5 hosts without HRR: FTT=1 using RAID-5 (2+1)
    • 6 or more hosts without HRR: FTT=2 using RAID-6 (4+2)
  • Standard vSAN clusters (with Host Rebuild Reserve enabled)
    • 3 hosts with HRR: (HRR not supported with 3 hosts)
    • 4 hosts with HRR: FTT=1 using RAID-1
    • 5 hosts with HRR: FTT=1 using RAID-5 (2+1)
    • 6 hosts with HRR: FTT=1 using RAID-5 (4+1)
    • 7 or more hosts with HRR: FTT=2 using RAID-6 (4+2)
  • vSAN Stretched clusters:
    • 3 data hosts at each site: Site level mirroring with FTT=1 using RAID-1 mirroring for a secondary level of resilience
    • 4 hosts at each site: Site level mirroring with FTT=1 using RAID-5 (2+1) for secondary level of resilience.
    • 5 hosts at each site: Site level mirroring with FTT=1 using RAID-5 (2+1) for secondary level of resilience.
    • 6 or more hosts at each site: Site level mirroring with FTT=2 using RAID-6 (4+2) for a secondary level of resilience.
  • vSAN 2-Node clusters:
    • 2 data hosts: Host level mirroring using RAID-1

You might notice that in the configuration logic described above, the Auto-Policy management setting for a three-host cluster will not use RAID-5 erasure coding and uses RAID-1 mirroring instead. This setting helps three host clusters that are running in an error state (two hosts) for an extended period by ensuring a complete object would reside on two of the three hosts and avoids the need for ongoing inline calculations to account for data on an absent host. This is the initial setting for the Auto-Policy Management capability and may be subject to change. If you would like to use RAID-5 erasure coding on a three-host cluster, you can manually create a storage policy and use it for these conditions. As with past versions of vSAN we always highly encourage a cluster host count to be at least one host larger than the minimum hosts required for a storage policy.

With the introduction of the Auto-Policy management feature, as well as the Adaptive RAID-5 Erasure Coding with the ESA in vSAN 8, you might notice a growing difference between the minimum number of hosts that a given RAID scheme uses for data placement, and the minimum number of hosts that a given RAID scheme requires for use with a storage policy in a non-error condition state. This is especially noticeable when vSAN’s Host Rebuild Reserve is enabled on a cluster. The objective is to promote good N+1 practices to ensure that if there is a failure that degrades the prescribed resilience, another host is available to regain the prescribed levels of resilience.

Summary

The new Auto-Policy Management feature in vSAN 8 U1 serves as a building block to make vSAN ESA clusters even more intelligent, and easier to use. It gives our customers confidence that resilience settings for their environment are optimally configured.

@vmpete