With Virtual SAN 6.2, the VMware team delivered the biggest edition of the product since the initial release and the most awaited feature set to date – space efficiency. Virtual SAN 6.2 includes deduplication, compression, and erasure coding. In this post, I’d like to dive a little deeper into each of these features and provide some background on what these mean for the customers.
Let me begin by first addressing a question that has come up a lot since we made the release announcement – why do we support the new space efficiency features on All-Flash Virtual SAN only?
When we started working on the space efficiency features, we made a strategic decision to focus our efforts on All-Flash. This was based on a number of factors:
- We recognized that flash storage is undoubtedly the future of primary storage for the data center. All analyst reports have supported this observation as well. IDC for example projects that All-flash arrays will grow at 30% CAGR while hybrid arrays will grow at 5% CAGR. Many of our customers have instituted policies to m ove to all Flash-only server platforms. We believe that the puck is moving toward All-Flash systems in the data center, and we’re making investments to optimize Virtual SAN to be ideally situated to ride that wave.
- Consistent with our predictions, 2015 was a pivotal year for flash. The capacity of SSDs continued to increase, while their cost continued to decline. In 2015 we reac hed a tipping point where the cost of high capacity SSDs listed by most server vendors is just around 2X the cost of 10K RPM enterprise HDDs. The difference in cost is low enough such that data efficiency features such as deduplication and erasure coding can bring the total cost of the all-flash solution at par or even less expensive than a hybrid solution. What this means for our customers is that it is more cost effective and efficient to use high capacity SSDs for persistent storage with Virtual SAN.
- Our customers get a bigger bang for the buck with space efficiency on All-Flash. In the Virtual SAN architecture, where persistent storage in the Hybrid model is on HDDs, the total cost savings on a dollar basis from enabling space efficiency on hybrid is minimal. Our internal calculations show that deduplication on hybrid would have resulted in an overall reduction in the raw TCO of less than 10% for most of our customers. This does not even take into account the impact on performance, and the resulting increase in cost, which brings me to the final point.
- On the technical side, deduplication of persistent data results in additional random I/O operations for metadata. Erasure coding results in I/O amplification due to the required read-modify-write access patterns. These are the laws of physics. See Christos’ post for a detailed technical discussion. In practice, this means that there could be a significant performance drop on hybrid storage systems. Alternatively, customers would need to increase the number drives in the system to handle the additional random I/O load. IOW, you may be buying disks for the spindles, not for capacity, thereby negating the cost savings from space efficiency. It is not by accident that no storage vendor supports inline deduplication of persistent data on hybrid storage systems.
For the reasons I’ve cited above, we made a strategic decision to focus on All-Flash Virtual SAN for our space efficiency features. The resulting TCO for our customers have been extremely attractive. As you can see from the chart below, Virtual SAN’s All-Flash configuration is 48% less expensive than our nearest competitor’s hybrid (yes hybrid) configuration. You can also read more about the future of all-flash here.
I would like to reiterate here that while we’ve focused our data efficiency features on All-Flash Virtual SAN, we will continue to add features, Ready Nodes and support Hybrid configurations for the foreseeable future. Our goal is to provide the flexibility and choice to our customers and enable them to make the right decision for their specific environment.
Now, on to the specifics of the features.
Deduplication and Compression
In Virtual SAN, both deduplication and compression are treated as one feature—both are enabled at the same time and cannot be independently turned on or off. This makes it simple for customers since they do not need to figure out when to turn on which feature. The feature is turned on at the cluster level, and data is deduplicated at a 4K block size. Our efficient architecture allows us to choose a granular, 4K block size to deliver best storage efficiency.
Deduplication and compression are performed inline when IOs are de-staged from the caching tier to the persistent tier. We deduplicate the data first before compressing it using LZ4, and compression is applied only when data can be compressed down to 2KB or smaller size. Deduplication is performed at the disk group level. How much space efficiency one can achieve depends on the workload. I discuss the performance and space efficiency results later in this post.
Erasure Coding (RAID 5/6)
Erasure Coding introduces RAID 5 and RAID 6 capabilities to greatly increase usable capacities by removing the need for complete, mirrored copies of the data. Erasure Coding is configured through the VM Storage Policies. Users can pick either the old RAID-1 Mirroring or the new RAID-5/6 (Erasure Coding). RAID-5 is used when FTT = 1 and RAID-6 is used when FTT = 2.
RAID-5 provides protection against 1 disk or other component failure and RAID-6 provides protection against 2 disks or other component failures. With RAID-5, a VMDK (or Virtual SAN object) is striped in a way such that there are 3 data fragments and 1 parity fragment, and with RAID-6, the data is split into 4 data fragments and 2 parity fragments. This means that with RAID-5, a minimum of 4 nodes is required to protect against single node failures, and with RAID-6, a minimum of 6 nodes is required. RAID-5/6 can be turned on with Fault Domains. In this case, each component resides on a different Fault Domain.
Customers can choose between RAID-1 mirroring or RAID-5/6 to protect against component failures. There are a couple of factors to determine which one to use.
First, as Christos details in his post, a key trade-off between Erasure Coding and RAID-1 mirroring is better space efficiency vs performance. RAID-5/6 is certainly more space efficient than RAID-1 mirroring, but comes at the price of the amplification of I/O operations, and hence may result in some performance degradation (both IOPS and latency). Second, customers need to have enough nodes for both the parity data and for rebuilds with RAID-5/6. While the minimum number of nodes required for RAID-5 is 4, in order to account for node failures, it is a good practice to have 5 nodes. The reason is that in the presence of failures (host or storage), with an additional host in the cluster (and sufficient spare capacity), VSAN can immediately start recovery and bring the data protection up to the level required by the specified policy. Without additional hosts, data protection remains degraded until new hardware is added to the cluster.
What RAID mechanism our customers should choose depends on their required performance as well as the number of nodes in their Virtual SAN cluster, and this brings me to the topic of performance and efficiency.
Please note that Erasure Coding does not work with stretched clusters in Virtual SAN 6.2.
Performance and Efficiency Impact
Through internal tests with real workloads, we have been extremely pleased to see that the new features have minimal impact on performance and overhead, even with all the space efficiency features turned on at once. We measured performance for the following workloads:
- VDI both linked clone and full clones
- Database – Low transactional workloads
- Database – High transactional workloads
- Exchange
We measured the total additional CPU consumed, latency and IOPS as seen by the clients. In most cases, the total additional CPU consumed from turning on all space efficiency features was less than 5% across the cluster. In addition, there was negligible drop in IOPS compared to the baseline measurement (i.e running All-Flash Virtual SAN 6.2 without turning on the space efficiency features). This is an incredible result and speaks volumes about the unique and efficient architecture of Virtual SAN.
Given the performance results we’ve seen, our customers can leverage these features for a large number of their workloads. As with any deployment, we recommend our customers to follow our design a sizing guide to accurately size and architect your Virtual SAN.
With respect to space efficiency, we measured efficiency ratios for a variety of workloads. Data efficiency ratios depend largely on the workload type. To give you a sense of what that means in practice, with VDI full clones, we’ve seen close to 8x capacity reduction; with a transactional workload like SQL server, we’ve seen up to 2x capacity reduction; with Exchange, we’ve seen up to 4x capacity reduction. So, the extent of space efficiencies entirely depends on the type of workload. We’re in the process of publishing a performance and space efficiency white paper that will discuss our testing results in detail.
In summary, Virtual SAN 6.2 is a banner release for VMware. Our goal is to delight our customers with the features we deliver and deliver them at the lowest cost possible. Data efficiency was the most awaited feature set since we release Virtual SAN in March 2014. With it, All-Flash Virtual SAN is becoming a cost-effective and practical option for a large part of our customer base. Kudos to the entire team who worked tirelessly to deliver these features!