The post, “An Introduction to the vSAN Express Storage Architecture” detailed the key drivers behind VMware’s introduction of its revolutionary, optional architecture in vSAN 8. The value of this new architecture is ultimately based on the benefits it delivers to our customers. One of the most noteworthy capabilities the ESA provides is the delivery of highly resilient data in a space-efficient manner using RAID-5/6 erasure coding without any compromise in performance. Customers will now be able to achieve RAID-1 levels of performance when using erasure coding schemes that deliver an optimal and predictable level of space savings compared to RAID-1 mirroring.
Let’s look at how this is achieved, and the challenge it solves for administrators.
Eliminating the Performance/Space Efficiency Trade-Off
Erasure coding is a common way to store data in a resilient and space-efficient manner. It is not unusual to experience a tradeoff using this type of resilient data placement scheme because it typically takes more effort and resources to store data in this way. While past editions of vSAN using the Original Storage Architecture (OSA) made great strides in making erasure codes more efficient, the disparity still existed. Administrators needed to decide on storage policy rules based on what was more important for the given workload: Performance or space efficiency. While storage policies allowed for relatively easy adjustment, if a variable like this could be eliminated, it would make vSAN that much easier to use.
How the vSAN ESA Achieved Performance without Compromise
The elimination of performance/space efficiency tradeoff is partly through the vSAN ESA’s new patented log-structured file system (vSAN LFS) and its optimized data structure. Note that the references to a “log-structured file system” do not refer in any way to a traditional file system such as NTFS, ext4, etc. This common industry term refers to a method of how data and metadata are persisted in the storage subsystem.
In Figure 1, we show a simplified data path for a VM assigned a level of resilience of FTT=2 using RAID-6. The vSAN LFS will ingest incoming writes, coalesce them, package them with metadata, and write them to a durable log that is tied to that specific object. When the durable log has received the packaged data, it will return a write acknowledgment to the guest VM to keep latency to a minimum.
This durable log lives on a new branch of the data structure in an object known as a “performance leg.” The data in this durable log will be stored as components, in a mirrored fashion across more than one host. For an object assigned FTT=1 using RAID-5, it creates a two-way mirror. For RAID-6, a three-way mirror is created.
Figure 1. Ingesting writes with the vSAN log-structured file system
Continuing in Figure 1, as data in the durable log accumulates, the LFS will make room for new incoming I/Os by taking these large chunks of data and writing them to the other branch in the object data structure known as the “capacity leg.” It sends the data as a fully aligned, full stripe write that is reflective of the storage policy assigned to the object (RAID-5 or RAID-6). It avoids the read/modify/write process that can be found in other approaches. The result is a very efficient, large, fully aligned, full stripe write with a minimal amount of CPU and I/O amplification.
The vSAN LFS only writes the data payload to the full stripe write on the capacity leg. The metadata associated with that chunk of data is transitioned to a metadata log, where it can be accessed quickly and retained for a longer period. The LFS has other mechanisms in place to retain metadata even longer, using multiple data tree structures known as B-Trees.
In Figure 2, we see how this RAID-6 object is placed across the vSAN hosts in a cluster. The components in the performance leg of an object will always exist on the same hosts as the components in the capacity leg of the same object. Components are simply an implementation detail of vSAN, meaning the administrator will not need to manage anything differently.
Figure 2. Composition of a RAID-6 object when using the vSAN ESA.
You may wonder how much additional capacity this performance leg may consume. While it exists per object, it is extremely small – just large enough to pipeline the data into a persistent storage area. With such fine levels of granularity, there is no need to factor additional sizing into the design of a vSAN cluster using the ESA. It is a great example of how the ESA in vSAN 8 allows claimed storage devices to contribute to capacity.
There are also some new and unique traits specifically with FTT=1 using RAID-5, but that will be described in another blog post.
Recommendation: When a cluster is using the vSAN ESA, the guidance will be to use RAID-5/6 erasure coding for all cluster types and sizes that support the data placement schemes. RAID-1 mirroring will still be required for site level tolerance in stretched clusters, and host level tolerance for 2-Node clusters. However, secondary levels of resilience in those topologies would be good candidates for RAID-5/6 erasure coding.
What it means to you
The improvements described above are largely transparent to the user, so those administering vSAN using the Original Storage Architecture will be able to operate vSAN in the same way as before, albeit with the additional benefits:
- Simplified Administration. Eliminating the performance tradeoff between RAID-5/6 and RAID-1 makes vSAN simpler.
- Guaranteed space efficiency for all standard cluster sizes. RAID-5/6 erasure coding has predetermined capacity savings when compared to RAID-1 mirroring. This is your chance to immediately save capacity and reduce your TCO. FTT=2 using RAID-6 will consume just 1.5x when compared to 3x for FTT=2 using RAID-1.
- Higher levels of resilience. If you had workloads that you wished to protect at a higher level of resilience but didn’t want to impart the capacity penalties of FTT=2 using RAID-1, or the performance penalties of FTT=2 using RAID-6, this is the perfect solution for you. FTT=2 using RAID-6 will offer double the resilience and consume less capacity (1.5x) than a less resilient object with FTT=1 using RAID-1 mirroring (2x).
Recommendation. Keep the Default Storage Policy as RAID-1, and use a new storage policy to take advantage of RAID-5 and RAID-6. A storage policy is an entity of vCenter Server, and may be responsible for many clusters running the vSAN Original Storage Architecture (OSA) where RAID-5/6 may not be appropriate, or supported.
Summary
Improving performance and efficiency while simplifying administration was an important goal for VMware. Thanks to the vSAN ESA, customers can now select RAID-5/6 for all their workloads, gaining improved space efficiency without compromising performance.