Technical VCF Storage (vSAN)

Performance Improvements with the Express Storage Architecture in vSAN 8 U1

The Express Storage Architecture (ESA) in vSAN 8 introduced a new way of processing and storing data. It gives our customers all-new capabilities, drives better performance, and improves levels of space efficiency, all while consuming fewer CPU resources. It is a remarkable feat of engineering that sets the stage for new capabilities not possible with the original storage architecture.

With vSAN 8 U1, we keep this momentum going with two new, very compelling enhancements to improve the performance and efficiency of storage processing. Let’s look at this in more detail to understand what these enhancements are, and what type of conditions they will help.

vSAN ESA Adaptive Write Path

Our new adaptive write path in vSAN 8 U1 gives the ESA one of two approaches to writing data. The default write path accommodates a mix of I/O sizes, while a new alternate write path is optimized to handle large I/O from guest VMs. Why would we need two ways of writing data? A bit of context will help explain why this is so important.

Modern flash devices prefer a certain minimum amount of data to be written. This helps reduce wear and garbage collection activities on the devices. The vSAN ESA was designed specifically to write chunks of data that are optimally sized for these devices. But the ESA was also designed to write the data in a space-efficient way using the least amount of effort. To achieve this, we use RAID-5/6 erasure coding writing the data as full stripe writes. These types of writes avoid read-modify-write steps often associated with writing data using erasure codes.

Unfortunately, VMs don’t always write data in large chunks. Many times, they will be updating just small amounts of data. To account for this, the ESA uses a log-structured file system to coalesce incoming I/O and persist the data and metadata to a log so we can send the write acknowledgment back as quickly as possible. This keeps latency to VMs low and consistent. We then write the data from the log as a full stripe write at a later time. That represents the default write path that shipped with the ESA in vSAN 8.

But VMs may also issue large writes as well. It is the adaptive write path in vSAN 8 U1 that helps write data under these conditions in a more optimal way. When vSAN identifies certain conditions such as writes using large I/O sizes, or even large quantities of outstanding I/O, it will use a new large I/O write path. The ESA in vSAN 8 U1 will immediately commit those I/Os from our in-memory stripe buffer as a full stripe write, and only write the metadata to our log-structured filesystem. The write acknowledgment will be sent to the VM when the data has been committed to the full stripe write and the metadata is committed to the durable log. While the data payload bypasses our durable log, a very small amount of metadata is still written resiliently to the two-way or three-way mirror (depending on the assigned storage policy) to ensure it is stored resiliently.

This decision process by vSAN occurs in real-time on a per-object basis. It has mechanisms in place to help determine if subsequent I/Os do or do not meet the criteria for this large I/O write path (occurring in a matter of microseconds) and will fall back to the default write path if it does not.

Figure 1. Incoming writes using ESA’s large I/O write path.

When we consider where the data is written to using the ESA’s default write path for an object using a RAID-6 erasure code, this reduces the write amplification of data payload from 4.5x (3x for a three-way mirror + 1.5x for a 4+2 RAID-6 stripe with double parity) to just 1.5x. It will not only reduce the amount of computational effort on the CPUs but will also reduce network traffic. This new adaptive write path will result in higher throughput for any workloads that tend to generate large I/O sizes, or a large amount of pending I/Os, creating situations commonly referred to as high outstanding I/O.

Optimized I/O Processing for Single VMDK Objects

The vSAN ESA opens up the data path in a way that allows VMs to process I/O at new levels. Not only does this allow data to be written and read faster, but it can also help our engineering teams at VMware identify new areas in the stack to optimize to drive performance even higher. This is not unusual. Processes inside of software are often written with some understanding of the limits of the hardware and other processes in the stack. As new hardware advances, these processes must be written to take advantage of this new performance potential.

vSAN 8 U1 introduces additional helper threads in our distributed object manager: the layer of our vSAN stack that among other things, is responsible for coordinating and processing I/O to the object. These helper threads help disperse the effort across more CPU cores and help reduce CPU exhaustion if a discrete process is overwhelmed.

Figure 2. Optimized I/O processing for single VMDK objects.

This increased parallelism is going to be most noticeable on resource-intensive VMs processing a lot of I/O per VMDK that were previously constrained by the stack in some way. Whether they are mission-critical applications or heavy transaction-based systems, they may see up to 25% improvement in IOPS and throughput. Our internal testing has demonstrated this enhancement will benefit a variety of workload types, including VMs issuing large sequential writes, as well as small random reads. Even when workloads do not directly benefit from this feature, there can be a second order of benefit, where a reduction of contending processes and resources will allow other unrelated activities using the same shared resources to complete their processes more quickly.

Summary

The two new performance-focused enhancements described here are a great example of how vSAN and it’s Express Storage Architecture is advancing quickly.

@vmpete