Technical VCF Storage (vSAN)

RAID-5/6 Erasure Coding Enhancements in vSAN 7 U2

Anyone who pays attention to the improvements in performance that VMware makes to vSphere and vSAN has a lot to read about. Entire teams at VMware are dedicated to ensuring that all code changes perform equal to or better than the previous build. This is quite an astonishing feat considering VMware is always adding new data services and capabilities. Developers work with these teams closely to analyze test data, as well as telemetry data through vSAN Support Insight, and look for opportunities to optimize the stack. New hardware pushes the boundaries of the software, which gives VMware opportunities to pass along those benefits to the users.

Several factors contribute to the effective performance as seen by the VM. Hardware such as storage devices and the network are two of the most significant influences on performance, especially when writing data redundantly in a distributed storage system like vSAN. The goal for vSphere and vSAN is to determine the most efficient way to deliver data where it needs to go to maintain resilience. As hardware capabilities improve, so must the hypervisor.

A Primer on Data Placement Schemes in vSAN

vSAN achieves resilience of data in different ways. One way is by having a copy, or a mirror of a chunk of data (an object in vSAN) to one or more locations, or hosts. This replica of an object resides somewhere else in the cluster to provide resilience. The level of resilience is defined in the assigned storage policy, and vSAN takes care of the rest, placing it in the cluster to achieve the desired result. A level of failure to tolerate of 1 (FTT=1) when using RAID-1 mirroring creates two copies of that object. FTT=2 creates three copies of that object, and an FTT=3 creates four copies of the object.

Figure 1. Objects data resilience through RAID-1 mirroring.

Data mirroring is a simple data placement scheme that uses minimal computational overhead but comes with the tradeoff of using an equal amount of capacity somewhere else in the cluster to protect the object at the level of resilience you desire.

The other way vSAN achieves resilience is through the use of erasure codes. Erasure coding is a method of fragmenting data across some physical boundary in a manner that maintains access to the data in the event of a fragment or fragments missing. In vSAN’s case, erasure codes do this striping data with parity across hosts. Unlike a RAID-1 mirror where there are two or more copies of the data, there will only be a single instance of an object using RAID-5/6 erasure coding. The data with parity is spread across the hosts to provide this resilience. An object assigned FTT=1 using erasure coding (RAID-5) will maintain availability in the event of a single failure (e.g. host) and will spread that data across 4 hosts. An object assigned an FTT=2 using erasure coding (RAID-6) will maintain availability in the event of a double failure, spreading that data across 6 hosts. As a reminder, an object using erasure coding is not spread across all hosts. vSAN’s approach offers superior resilience under failure conditions and simplified scalability.

Figure 2. Objects data resilience through RAID-5 erasure coding.

The benefit of erasure codes is predictable space efficiency when compared to mirroring data. Providing resilience for a single failure (FTT=1) for an object using erasure coding consumes just 1.33x the capacity of the single object. Providing resilience for a double failure (FTT=2) for an object using erasure coding consumes just 1.5x the capacity of the single object. With mirroring, FTT=1 and FTT=2 would consume 2x and 3x the capacity, respectively.

While erasure coding is extremely efficient with capacity, it does consume additional CPU and network resources. For all new data written or updated, calculations must occur to complete the stripe with parity, which leads to an inherent amplification of I/Os to read the data, calculate the parity, and distribute the data and parity across hosts. The result is that this can impact the performance as seen by the guest VM. The amount will depend on the host hardware, the networking hardware, and the characteristics of the workload.

Recommendation: If you aren’t on 25/100Gb Ethernet already, you should be planning for it. The capabilities of modern storage devices and the hosts they live on can saturate 10Gb ethernet, and can no longer keep up with the power of other modern hardware. As the performance capabilities of the hosts’ increase, so should your network.

What has improved?

Of the many performance enhancements made to vSAN 7 U2, one of them helps drive better performance to data assigned with RAID-5/6 erasure codes. The enhancements occur in vSAN’s distributed object manager: The layer responsible for how and where the data is written.

The improvements relate to how and when vSAN performs a parity calculation, optimizing how vSAN reads old data and parity information to write the new data and parity. vSAN 7 U2 improves the caching of the old data fragments that are read for the data in other columns of the same row for a 3+1 or 4+2 stripe. This helps to update parity information without incurring the cost of a read operation and the other overhead of those reads.

Figure 3. Improved read-modify-write parity calculations for RAID-5/6 erasure coding in vSAN 7 U2.

In other words, the read-modify-write method using erasure codes is more efficient in vSAN 7 U2. This is only a calculation optimization, meaning that there are no structural changes to the layout used for data with parity in vSAN’s erasure codes. The memory used for this enhancement to the parity calculations is not related to the “client cache” feature of vSAN, nor does it increase memory usage on the host.

**This improvement was enhanced even further in vSAN 7 U3. See the post: “Improving RAID-5/6 in vSAN 7 U3 using Heuristics” for more details.

Where will I see the improvement, and by how much?

This type of optimization will be most beneficial to workloads using a RAID-5 or RAID-6 storage policy and issuing bursts of large sequential writes. Large sequential writes tend to write or update more fragments of a stripe and take the best advantage of this latest optimization. The reduced CPU cost per I/O helps not only the VM using the erasure code storage policy but all other VMs in the environment, as performance is closely tied to efficiency.

Storage policies make it easy to see if this improvement allows you to run a given workload under a more space-efficient setting. Perhaps you had tried to use RAID-5/6 with a workload on a previous occasion, but the erasure code combined with your underlying hardware didn’t quite meet the performance requirements of the application. Once you have upgraded to vSAN 7 U2, you can assign the VMDK object, VM, or groups of VMs to using a RAID-5/6 erasure code and wait for the resynchronization to complete to see if it meets the requirements. Remember that if it still doesn’t meet your needs, you may be constrained by the performance of storage devices, your network, or some other element in the stack.

How much of an improvement will you see? The answer depends on the workload, and the capabilities of your host and network hardware. I/O amplification and CPU computation are inherently increased using RAID-6 versus RAID-5, so the optimizations will benefit RAID-6 more than RAID-5. Bursts of writes may see a bigger benefit than sustained writes, as the latter may be constrained by the physical characteristics of your capacity tier.

Systems not bound by physical hardware characteristics (storage device capabilities and network fabric limitations) will see greater improvements than those constrained by physical hardware. Assuming no significant contention elsewhere in the stack (such as the network), large sequential writes (256K) using RAID-5/6 erasure coding may see up to 50% improvement in some circumstances. Since real-world I/O activity for a given workload is often an ever-changing mix of patterns and sizes, only a subset of conditions may see improvements. This is a good thing, as it is usually just a subset of conditions that are the area of concern anyway.

Summary

Erasure codes are an extremely space-efficient way to store data redundantly. VMware continues to look at ways to close the gap in design tradeoffs when using deterministic space-efficiency options like erasure coding in vSAN. The performance improvements described here and included in vSAN 7 U2 are a great example of this. Faster hardware continues to drive the potential for improved performance, and VMware is determined to keep pace.

@vmpete