How To Double Your VSAN Performance
VSAN 6.0 is now generally available!
Among many significant improvements, performance has been dramatically improved for both hybrid and newer all-flash configurations.
VSAN is almost infinitely configurable: how many capacity devices, disk groups, cache devices, storage controllers, etc. Which brings up the question: how do you get the maximum storage performance out of VSAN-based cluster?
Our teams are busy running different performance characterizations, and the results are starting to surface. The case for performance growth by simply expanding the number of storage-contributing hosts in your cluster has already been well established — performance linearly scales as more hosts are added to the cluster.
Here, we look at the impact of using two disk groups per host vs. the traditional single disk group. Yes, additional hardware costs more — but what do you get in return?
As you’ll see, these results present a strong case that by simply doubling the number of disk -related resources (e.g. using two storage controllers, each with a caching device and some number of capacity devices), cluster-wide storage performance can be doubled — or more.
Note: just to be clear, two storage controllers are not required to create multiple disk groups with VSAN. A single controller can support multiple disk groups. But for this experiment, that is what we tested.
This is a particularly useful finding, as many people unfamiliar with VSAN mistakenly assume that performance might be limited by the host or network. Not true — at least, based on these results.
For our first result, let’s establish a baseline of what we should expect with a single disk group per host, using a hybrid (mixed flash and disks) VSAN configuration.
Here, each host is running a single VM with IOmeter. Each VM has 8 VMDKs, and 8 worker tasks driving IO to each VMDK. The working set is adjusted to fit mostly in available cache, as per VMware recommendations.
More details: each host is using a single S3700 400GB cache device, and 4 10K SAS disk drives. Outstanding IOs (OIOs) are set to provide a reasonable balance between throughput and latency.
On the left, you can see the results of a 100% random read test using 4KB blocks. As the cluster size increases from 4 to 64, performance scales linearly, as you’d expect. Latency stays at a great ~2msec, yielding an average of 60k IOPS per host. The cluster maxes out at a very substantial ~3.7 million IOPS.
When the mix shifts to random 70% read / 30% writes (the classic OLTP mix), we still see linear scaling of IOPS performance, and a modest increase in latency from ~2.5msec to ~3msec. VSAN is turning it a very respectable 15.5K IOPS per host. The cluster maxes out very close to ~1m IOPS.
Again, quite impressive. Now let’s see what happens when more storage resources are added.
For this experiment, we added an additional controller, cache and set of capacity devices to each host. And the resulting performance is doubled — or sometimes even greater!
Note that now we are seeing 116K IOPS per host for the 100% random read case, with a maximum cluster output of a stunning ~7.4 million IOPS.
For the OLTP-like 70% read / 30% write mix, we see a similar result: 31K IOPS per host, and a cluster-wide performance of ~2.2 million IOPS.
For all-flash configurations of VSAN, we see similar results, with one important exception: all-flash configurations are far less sensitive to the working set size. They deliver predictable performance and latency almost regardless of what you throw at them. Cache in all-flash VSAN is used to extend the life of write-sensitive capacity devices, and not as a performance booster as is the case with hybrid VSAN configurations.
In this final test, we look at an 8 node VSAN configuration, and progressively increase the working set size to well beyond available cache resources. Note: these configurations use a storage IO controller for the capacity devices, and a PCI-e cache device which does not require a dedicated storage controller.
On the left, we can see the working set increasing from 100GB to 600GB, using our random 70% read / 30% OLTP mix as before.
Note that IOPS and latency remain largely constant: ~40K IOPS per node with ~2msec latency. Pretty good, I’d say.
On the right, we add another disk group (with dedicated controllers) to each node (flash group?) and instead vary the working set size from an initial 100GB to a more breathtaking 1.2TB. Keep in mind, these very large working set sizes are essentially worst-case stress tests, and not the sort of thing you’d see in a normal environment.
Initially, performance is as you’d expect: roughly double of the single disk group configuration (~87K IOPS per node, ~2msec latency). But as the working set size increases (and, correspondingly, pressure on write cache), note that per-node performance declines to ~56K IOPS per node, and latency increases to ~2.4 msec.
What Does It All Mean?
VSAN was designed to be scalable depending on available hardware resources. For even modest cluster sizes (4 or greater), VSAN delivers substantial levels of storage performance.
With these results, we can clearly see two axes to linear scalability — one as you add more hosts in your cluster, and the other as you add more disk groups in your cluster.
Still on the table (and not discussed here): things like faster caching devices, faster spinning disks, more spinning disks, larger caches, etc.
It’s also important to point out what is not a limiting factor here: compute, memory and network resources – just the IO subsystem which consists of a storage IO controller, a cache device and one or more capacity devices.
The other implication is incredibly convenient scaling of performance as you grow — by either adding more hosts with storage to your cluster, or adding another set of disk groups to your existing hosts.
What I find interesting is that we really haven’t found the upper bounds of VSAN performance yet. Consider, for example, a host may have as many as FIVE disk groups, vs the two presented here. The mind boggles …
I look forward to sharing more performance results in the near future!