Technical VCF Storage (vSAN)

vSAN Max and the Advantage of Scalability

The introduction of vSAN Max at VMware Explore Las Vegas garnered a lot of attention, and rightly so. Built off of the vSAN Express Storage Architecture (ESA), it provides high-performance, Petabyte-scale shared storage for vSphere clusters using commodity servers.

One of the many questions I received at Explore was something similar to, “vSAN Max looks interesting, but why would this be better than a traditional modular storage array?” It is an important question to ask because the answer highlights one of the many benefits of vSAN Max: Incremental scalability of performance and capacity.

Let’s explore this answer in a little more detail to better understand one way that vSAN Max is different than a traditional storage array, and why it matters.

Traditional Three-Tier Storage using Modular Storage Arrays

Most storage arrays currently in use consist of a base chassis, or head unit that is comprised of processors, memory, redundant storage controllers, and storage devices. Storage arrays can sometimes be expanded using additional expansion units connected by some type of backplane to increase capacity. This is sometimes referred to as a “modular scale-up” approach, as the head unit typically remains responsible for processing any of the data that resides in the attached expansion units. The head unit and its storage controllers are what provide the conduit and processing of I/O for the connected vSphere hosts, funneling data I/O from the respective vSphere hosts through the controllers for the shared storage solution. The array will then use a clustered file system like VMFS, or vVols and present those volumes to vSphere hosts.

It is an approach that has had success over the years and was also one of the easiest ways to introduce a shared storage solution to the market, by evidence of dozens of share storage offerings over the past two decades. While device backplanes, controller interconnects, protocols, and processing capabilities have all improved over the years, workload demands, and capacity requirements have also increased dramatically. Since the general principles behind the architecture remain the same, so do its challenges. Most notably, accommodating for growth, and overall change in the data center. No matter what hardware and software technologies are used, this type of architecture has historically struggled with scaling capacity incrementally and economically because of its use of a head unit combined with limited expansion units. Scaling performance is typically even more challenging because the funneling of I/O through central controllers creates a potential point of contention and makes it difficult to ensure that the head unit can meet current and future performance needs.

Figure 1. Scaling capacity and performance in a three-tier architecture using a storage array.

Centralized connection points to funnel I/O through can sometimes introduce unpredictable performance of some I/O over other I/O, since the processing of I/O is on a “first come first serve” basis from the perspective of the array, but arbitrary from the perspective of the hosts. These array controllers often use a global shared buffer and caching tier. With multiple workloads using these same resources, the hot data from one important VM may be flushed because of a large, but less important working set of another workload.

To help offset those fears, it is not unusual for array vendors to offer several head units at varying degrees of processing power, and cost. However, oversizing an array head unit places a burden on a customer to spend today’s dollars on depreciating hardware in hopes that it avoids a problem later. There is also no guarantee that the I/O contention will be avoided, as the architecture may be bound or constrained in other ways. The storage fabric, the interconnected storage controllers, and even the storage devices themselves can influence the performance, especially if SATA or SAS interfaces paired with interposers are used.

Distributed Scale-Out Architecture in vSAN Max

vSAN Max is different. It is a fully distributed architecture, where access to data is NOT funneled through centralized I/O controllers but using the full power of each node (host) in the vSAN Max cluster. The aggregated resources across all hosts in a vSAN Max cluster contribute to the processing of I/O. The addition of new hosts means that it can scale capacity and performance linearly.

Let’s use the following example, where each host in a vSAN Max cluster has 300TB of capacity, 56 cores of compute, and a 100GbE uplink to exploit the full capabilities of the NVMe storage devices in each host. That means that a 6 host vSAN Max cluster would have an aggregate capacity of 1.8 PiB, aggregate compute resources of 336 cores to process I/Os, and an aggregate bandwidth of 600Gbps to provide the needs of the connected vSphere hosts.

But for every host added, one would add 300TB of capacity, 100Gb of networking, and 56 more storage processing cores. A vSAN Max cluster consisting of 12 hosts would have twice the resources and capabilities of a 6 host cluster, and a cluster consisting of 18 hosts would have 3x the resources and capabilities of a 6 host cluster.

Eliminating centralized connection points to funnel I/O through improves the predictability of discrete workloads because they may be using a path that is different than the other workloads. And since each object’s data path can be dynamically tuned by vSAN, this can help prevent cases where one resource-intensive VM impacts another VM.

Figure 2. Linear, predictable scalability of capacity and performance with vSAN Max.

Note that vSAN Max can not only scale out by adding more hosts to a cluster, but it can also scale up by adding more storage devices to each host. Imagine a scenario where you have an 8 host vSAN Max cluster, and you wish to add more capacity. One could easily acquire say, 2 more 16TB storage devices per host, and end up with about 256TB more cluster capacity without adding any more servers. Those types of incremental costs can often be approved as a relatively trivial expense versus the approvals needed for a much larger capital expenditure.

Figure 3. Scaling up host capacities for quick and affordable growth.

If the cluster is using vSAN ESA’s “Managed Disk Claim” capability, the new devices will be recognized automatically, making the process of scaling up easy and fast.

vSAN Max not only has a substantial benefit in meeting performance and capacity requirements as the cluster grows but also in handling failure conditions. A failure of a storage node in a 12 host vSAN Max cluster only impacts 1/12th of the resources, as all I/O resources will be redistributed evenly across the remaining 11 hosts. This type of minimal impact is typically far less than what would occur with a controller failure in a dual-controller storage array.

Beyond Predictable, Incremental Scalability

The distributed architecture of vSAN Max makes addressing both capacity and performance scaling achievable. But it isn’t just the physically distributed architecture that makes it special, but how vSAN Max stores data. Unlike a traditional clustered file system like VMFS, vSAN Max uses an approach analogous to an object store, where a boundary of data managed and accessed is much smaller than a traditional VMFS datastore. We do this for specific reasons, as noted in the post: “vSAN Objects and Components Revisited.” This type of object-based storage paired with ESA’s design minimizes the boundary of failure to a minimal level and makes it incredibly robust regardless of cluster size.

And let’s not forget vSAN Max’s unique data path. Since it is built using the vSAN ESA, its high-performance data path and scalable, efficient data structure provide not only the near device-level performance of extremely fast NVMe devices but the incremental scalability that is desired across all types of use cases. For example, vSAN Max can dynamically use one of two write paths based on the characteristics of the incoming writes, and tune the path based on the intensity of writes for that given object. This allows vSAN Max to write even the most demanding large sequential writes quickly – the perennial challenge for even the best storage arrays. But writing data is only half the battle. A storage system must be able to read data reliably and efficiently even as data storage multiplies to extraordinary capacities. Fast and efficient reads are a fundamental part of the vSAN ESA, which powers vSAN Max clusters.

Summary

Deploying a storage platform that is sized correctly while easily accommodating for future growth is easy to comprehend, but difficult to implement when not using the right architecture. vSAN Max can offer incremental and predictable scalability for your environments where you wish to keep your storage resources separate from your compute resources, yet can be managed in the same way your vSAN HCI clusters are managed.

@vmpete