Technical VCF Storage (vSAN)

Observing Capacity Changes in vSAN

When defining and applying storage policies to objects in a vSAN cluster, you may notice that some policy definitions will impact the amount of storage capacity consumed by the objects (VMs, VMDKs, etc.) that are assigned the given policy. Let’s explore why this happens, and what to look for in your environment.

Policies and their impact on consumed capacity

vSAN is unique when compared to other traditional storage systems in that it allows for configuring levels of resilience (e.g. Failures to tolerate, or FTT) and the data placement scheme (RAID-1 mirroring or RAID-5/6 erasure coding) used for space efficiency. These configurations are defined in a storage policy, and assigned to a group of VMs, a single VM, or even a single VMDK.

Changes in capacity as a result of storage policy adjustments can be temporary, or permanent.

  • Temporary: Temporary, or transient space is consumed when a policy is changing from one data placement approach to another. It is building a new copy (known as resynchronization) of that data to replace the old copy of data, and comply with the newly assigned policy. For example, VM using a RAID-1 mirror to a RAID-5 erasure code would result in space being used to create a new copy of the data using a RAID-5 scheme. Once it is complete, the copy of the data under the RAID-1 mirror is deleted, reclaiming the temporary spaced used for the change in storage policy.
  • Permanent: Permanent space is consumed when applying a storage policy using a higher level of failures to tolerate (FTT=1 to FTT=2), or from an erasure code to a mirror (e.g. RAID-5 to RAID-1). The effective capacity used will occur after the change in policy has been completed (using temporary storage capacity), and will remain for as long as that object is assigned to the given storage policy.

To better understand how both temporary and permanent changes occur as a result of a policy change, let’s look at Figure 1. Here we see an object that transitions from a RAID-1 mirror to a RAID-5 erasure code. In this case, both provide a level of failure to tolerate of 1, but the transition to the RAID-5 data placement scheme means that temporary space (shown in the illustration as “overhead”) will be consumed to achieve this result. Once completed, we can see that the temporary space is no longer used, and the object now consumes less overall capacity because it is using a more space-efficient data placement scheme than the original RAID-1 mirror.

Figure 1. Understanding how a change in storage policy will affect storage capacity.

The amount of temporary and permanent space consumed for a storage policy change is a reflection of how many objects are changed at the same time, and what the respective capacity used for those objects. The temporary space needed is the result of resynchronizations. Due to this prescriptive nature of storage policies, vSAN presents the raw capacity provided by the datastore, as observed in vCenter, VMware Aria Operations, and PowerCLI.

Estimating Usage

The vSAN performance service provides an easy-to-use tool to help estimate available free usable capacity given the selection of the desired policy. Simply select the desired storage policy, and it will estimate the free amount of usable capacity with that given policy. It does not account for the free space needed for slack space as recommended by VMware.

Figure 2. The free capacity with policy calculator in the vSAN Ui found in vCenter.

While the estimator is limited to giving an estimate based off of a single policy, you may find it extremely useful in how to better understand the amount of usable capacity remaining in a cluster.

Observing Changed Usage as a Result of Storage Policy Changes

There are multiple ways to view storage capacity changes. The following will illustrate how observing capacity changed via storage policy changes can be achieved by using both vCenter, and VMware Aria Operations.

In this example, a group of VMs that were using a storage policy using an FTT=1 via a RAID-1 mirror was changed to another storage policy using an FTT=1 via a RAID-5 erasure coding scheme. In vCenter, highlighting a vSAN cluster and selecting Monitor > vSAN > Performance > Backend will reveal the resynchronization activity that has occurred as a result of the policy change, as shown in Figure 3.

Figure 3. Observing resynchronization I/O activity as a result of a change in storage policies

When looking at the capacity history in vCenter, we see that the policy change created a temporary use of more space to build the new RAID-5 based objects. Once the resynchronization is complete, the old object data is removed. Deduplication and compression begin to take effect, and free capacity is reclaimed. Figure 4 below shows how this is presented in vCenter.

Figure 4. Using vCenter to observe cluster capacity utilization as a result of a resynchronization event

The Cluster Utilization widget in the vSAN Capacity Overview dashboard found in VMware Aria Operations will show the same results. Figure 5 shows this information presented in VMware Aria Operations. VMware Aria Operations will offer additional details via context-sensitive “sparklines” that will give precise breakdowns of deduplication and compression savings and storage utilization with and without deduplication and compression. Figure 5 below shows how this is presented in VMware Aria Operations.

Figure 5. Using VMware Aria Operations to observe cluster capacity utilization as a result of a resynchronization event

Note that different views may express the same data differently due to three reasons: Limits on the window presented on the X-axis, different values on the Y-axis, and different scaling for X and Y values. This is the reason why the same data may visually look different, even though the metrics are consistent across the various applications an interfaces.

Recommendation: Look at the overall capacity consumed after a storage policy change, rather than simply a deduplication and compression ratio. Space efficiency techniques like erasure codes may result in a lower deduplication and compression ratio, but may actually improve space efficiency by reducing consumed space. For more information on this topic, see “Analyzing Capacity Utilization with VMware Aria Operations”

Summary

Storage policies allow for an administrator to establish various levels of protection and space efficiency across a selection of VMs, a single VM, or even a single VMDK. Assigning different storage policies to objects will impact the amount of effective space they consume across a vSAN datastore. Both vCenter and VMware Aria Operations provide methods that help the administrator to better understand storage capacity consumption across the vSAN cluster.

@vmpete