A primary responsibility of storage is to make data requested readily available for access. While this may seem obvious, storage systems go to great lengths to accomplish this clear, but challenging objective. Hyper-converged infrastructures factor in additional considerations for data placement and resilience, as the storage is distributed across hosts. Considerations include, but not limited to: Where should the data be placed initially? How is the data dispersed in a way to make it resilient? Is the data sufficiently balanced? And how do ongoing operations such as host maintenance and changes in requirements impact where the data is placed?
vSAN 6.7 enhanced the data placement and management scheme with new functionality known as "Replica Component Consolidation." This enhancement is a continuation of similar "Intelligent Rebuilds" optimizations made in vSAN 6.6. Why do these enhancements matter to the typical vSAN administrator? Optimizations in data placement and management can mean less data movement. This type of data movement in vSAN is commonly known as resynchronizations. Less data movement translates to faster decommissioning of hosts or disk groups during full evacuations, reduced consumption of temporary space, and provides better agility to free up a fault domain such as a host or a disk group.
A refresher on object storage for vSAN
To understand the optimizations made, let’s take a moment to review vSAN's approach to storing data. vSAN's object-based storage system treats elements of a VM (such as a VMDK) as an object. Users assign storage policies to these objects that reflect the desired protection and performance outcomes. vSAN breaks these objects into smaller units called "components" as represented in Figure 1. These components are simply an implementation detail of vSAN and is completely transparent during the administration process.
Figure 1. Object and Components relationship
For the end user, the administration is at an object level. Understanding the role of object components simply provides a better understanding for administrators who want to know more about operational behaviors of vSAN.
Data Placement Basics
vSAN considers several factors for the initial and ongoing placement of data. All the considerations go well beyond the scope of this post, but here are a few basics about data management on vSAN.
vSAN governs the placement of these components to ensure compliance with the storage policies. For example, using a policy with a Failure Tolerance Method (FTM) of RAID-1 mirroring and a Failure to Tolerate (FTT) of 1 means the components that comprise each object replica (as in this case, there would be two object replicas) will not overlap on the hosts in a way that compromises the availability of an object beyond the single failure, as represented in Figure 1. vSAN accomplishes this through a form of anti-affinity for object replica components.
Other factors beyond the FTM and FTT policy rules can influence where data may be placed. Stripe width and object space reservation policy rules, as well as environmental factors such as cluster host count, the growth of thin provisioned VMDKs and physical device capacity usage, can impart changes in how and where components live on hosts across a vSAN cluster.
During initial data placement or any object tree rebuild, components that comprise a replica object are placed on the hosts in the cluster. vSAN attempts to keep the number of fault domains to a minimum. There may be some circumstance that does not keep the number of fault domains to a minimum and places them across more than one host. Figure 2 illustrates this where one of the object replicas used a minimum amount of fault domains (single host) while the other was spread across more than one host. vSAN will manage these conditions and ensure that it does not violate any of assigned policy settings.
Figure 2. An object replica spread across multiple fault domains versus a single fault domain
Data movement due to operational events
Operational events such full host evacuation may require vSAN to reconstruct object components to satisfy the policy. Reconstructing an object replica would consist of creating all new components in a new object tree for the entire mirrored replica: a task that takes resynchronization activity and temporary space to achieve. In versions prior to vSAN 6.6, this was the default behavior for evacuation scenarios.
vSAN 6.6 introduced a new method to help in these evacuations. New logic aimed to keep resynchronization traffic and temporary space usage to a minimum during these evacuation operations by moving as little as a single component to a location - any location - on a host in the cluster that doesn't violate the anti-affinity rules with components representing the other object replicas. This method is shown in Figure 3, where the host in orange is being decommissioned.
Figure 3. Efficient component movement to a free fault domain during a host decommissioning
This dramatically reduced resynchronization traffic and temporary space necessary for host or disk group evacuation. As a result, object replicas may be spread across more hosts than necessary. While this accelerated evacuations, subsequent evacuations could prove difficult to find an eligible host to move data to without violating the storage policy (especially in smaller clusters). In those conditions, it would fall back to the original behavior of performing a rebuild of the object tree: a more resource intensive operation.
Replica Component Consolidation
vSAN 6.7 optimizes this approach even further. A third method is introduced, referred to as "Replica Component Consolidation." This will consolidate object replica components if there are conditions in which they are spread beyond more than one fault domain and will reduce the likelihood of a more resource intensive object tree rebuild. It aims to adjust the placement of components in such a way to maintain compliance with the desired policy conditions while minimizing data movement.
Figure 4. The replica component consolidation method during host decommissioning
In the example of a full host or disk group evacuation, vSAN 6.7 approaches the task in a specific order:
Method #1: vSAN will attempt to find another fault domain - any fault domain - in a cluster that doesn't violate the anti-affinity rules with components representing the other object replicas. If it cannot achieve this, it will proceed to method #2.
Method #2: Replica consolidation initiates by looking for a replica that is spanning across two or more fault domains and attempts to consolidate them by moving the smallest component, which requires the least amount of effort. Upon success, it will proceed back to method #1. If it fails, it will proceed to method #3.
Method #3: Initiate an object tree rebuild, which will achieve a result similar to consolidation, but using more resources.
Why doesn't vSAN always perform a replica component consolidation as the first step? Replica consolidation is an additional step that allows the highly efficient method #1 to run. Consolidation is only used when there isn't a free fault domain in an attempt to avoid a rebuild of the object tree. The approach described emphasizes achieving the desired result using the least amount of effort (resources) as possible.
Replica component consolidation in vSAN 6.7 represents an enhanced level of intelligence to manage the ongoing placement of object components in the most efficient way possible. And just with previous editions of vSAN, all of these benefits simply come as a part of updating your hypervisor.