A key goal of vSAN is to deliver a reliable and consistent computing environment. One challenge to any storage system is making sure that when a device fails that the environment is brought back to full protection levels quickly. vSAN has some unique capabilities in that all free space is treated as “hot spare” capacity and allows for incredibly fast rebuilds to occur. The object placement also lowers the risk of concurrent failures impacting a given object the larger a cluster grows. This behavior is covered in detail in this blog. Beyond failure, other internal data movement (rebalancing, adjustments to SPBM policies) can cause data to move throughout the cluster.

Prioritization of resynchronization must be balanced against the overhead on regular virtual machine operations. A balance must be found where regular IO does not prevent rebuilds from happening promptly. in vSAN 5.5 and 6.0 limited resynchronization control could be implemented at the CLI. In earlier releases, this could be manually throttled. In vSAN 6.6 and ESXi 6.0 Patch 2, a static throttle for rebuilds was implemented. In vSAN 6.6.1 automated throttling of traffic was introduced. This introduced a general windowing capability based on virtual machine latency.

vSAN 6.7  introduces a new disk scheduler capabilities that allow it to distinguishes four types of I/O and has a pending queue for each I/O. This allows for prioritization of resynchronizations while limiting the impact on virtual machine traffic. This new system also continuously measures the throughput capabilities of disk groups to shape these queues in such a way that will adapt to the hardware and the workloads that in flight. Combined with other performance advancements vSAN 6.7 this should help deliver more reliable and consistent performance to any application.

For more information about this significant improvement read the tech note on StorageHub.