Space efficiency techniques used with storage systems aim to achieve one goal: lower storage costs through reduced capacity consumption. VMware vSAN offers several space efficiency options to meet a variety of needs of our customers, but with vSAN 7 U1, we improve this flexibility by introducing another option for space efficiency: "Compression only." Let’s look at why this new option is available, the workloads that it is designed for, and learn more about the details of its implementation.
Why Another Space Efficiency Option?
VMware vSAN introduced deduplication and compression (DD&C) at a time many years ago in which NAND flash was much more expensive than it is today. The economics of flash at the time inspired VMware to provide a cluster-based service that combined two space-saving techniques as a single feature for maximum space efficiency and simplicity for all-flash storage environments. I describe in detail vSAN’s implementation of this space efficiency feature, and some of the design considerations in the post, "vSAN Design Considerations – Deduplication and Compression."
Space efficiency techniques are a marvelous innovation, but each type introduces tradeoffs. Some workloads and data may not be ideally suited for certain types of space efficiency. Deduplication engines will run without regard to the data it is processing, which means that if the data does not deduplicate well, the additional computational effort and I/O amplification provide no benefit in those conditions.
The New "Compression only" Option in vSAN 7 U1
A "Compression only" option alleviates the challenge described above. vSAN administrators can use this setting for clusters with demanding workloads that typically cannot take advantage of deduplication techniques. It accommodates today’s economics of flash storage while maintaining an emphasis on delivering performance for high demand, latency-sensitive workloads.
Selecting the desired space efficiency option is easy. At the cluster level, the vCenter Server UI now presents three options:
- Compression only
- Deduplication and compression.
Figure 1. Cluster-level space efficiency options in vSAN 7 U1
Note that changing this cluster-level setting does require a rolling evacuation of the data in each disk group. This is an automated process but does require resources while the activity is performed.
When compared to the DD&C option, the "Compression only" option offers interesting advantages.
Reduce the failure domain of a capacity device failure. A failure of a capacity device in a disk group for a cluster using "Compression only" will only impact that discrete storage device, whereas the same failure using DD&C would impact the entire disk group. This reduced impact area of a device failure also reduces the amount of potential data that vSAN needs to rebuild upon a device failure.
Figure 2. Comparing the failure domain of a capacity device failure in vSAN 7 U1
Increased destaging rates of data from the buffer tier to the capacity tier. As described in "vSAN Design Considerations – Deduplication and Compression," vSAN’s two-tier system ingests writes into a high-performance buffer tier, while destaging the data to the more value-based capacity tier at a later time. The space efficiency processes occur at the time of destaging, and as described in that post, may have a potential impact on performance. When compared to DD&C, the "Compression only" feature improves destage rates in two ways: 1.) Avoids the inherent write amplification required with deduplication techniques, and 2.) Uses multiple elevator processes to destage the data.
Figure 3. Multiple elevators’ destaging data using the “Compression only” option
How much space savings can one expect using the "Compression only" feature? The answer to this depends on the workload, and the type of data being stored. Both of the DD&C and "Compression only" features are opportunistic, which means that space savings are not guaranteed. This capacity savings through compression can be easily viewed in the vCenter Server UI. Note that it may take some time before the savings ratio stabilizes.
By contrast, vSAN’s data placement techniques using erasure codes like RAID-5/6 are deterministic: They provide a guaranteed level of space efficiency for data stored in a resilient manner. RAID-5/6 erasure coding can be applied to VMs using storage policies and can be used with cluster-based space efficiency techniques.
What will the levels of performance be like when using the "Compression only" feature? This will land somewhere in between the performance of your hosts not running any space efficiency, and the performance of your hosts running DD&C.
Performance using "Compression only" could be superior when compared to the same environment using DD&C. This improvement would show up most where there are workloads with large working sets issuing large sequential writes and medium-sized random writes. In these cases, the absence of the deduplication engine and the improved parallelization of destaging will allow the data to be destaged faster, and less likely to hit buffer fullness thresholds that begin to impact the guest VM latency.
Figure 4. Improved steady-state performance using the "compression only" feature (Illustration not to scale)
The performance capabilities of vSAN are still ultimately determined by the hardware used, the configuration of vSAN, the version of vSAN, the associated storage policies, and the characteristics of the application & workload. To better understand how hardware selection (including the type of flash devices) impact performance, see the post "vSAN Design Considerations – Fast Storage Devices versus Fast Networking" and "Write buffer sizing in vSAN when using the very latest hardware."
Compression only, or Deduplication and compression? Which is right for you?
Workloads and data sets do not provide an easy way to know if they are ideally suited for some space efficiency techniques versus others. Therefore, the administrator should decide based on the requirements of the workloads and the constraints of the hardware powering the workloads. A comparison of design and operational considerations between the three options is provided below.
* Capacity savings not guaranteed
** Depends on workloads, working sets, and hardware configuration
For some environments, the minimal failure domain of a capacity device failure may be the only reason needed to justify the use of the "Compression only" feature versus the other options. Whatever the case, the configuration desired can be tailored on a per cluster basis.
VMware recommends the following settings for the best balance of capacity savings and performance impact. Workloads and environmental conditions vary, therefore these are generalized recommendations.
* If performance is of the highest priority, using no space efficiency would yield the highest sustained performance for the hardware configuration used.
Recommendation: If you are uncertain as to what is best for your clusters, and you prefer some degree of cluster-level space efficiency with minimal performance impact, choose the new "Compression only" feature.
As the economics of flash storage evolves, so does VMware. The new "Compression only" feature accommodates the prevalence of all-flash media and the demand for performance while acknowledging that not all workloads are suitable for deduplication. This new option is ideal for demanding workloads that have a focus on performance and are unable to take advantage of deduplication. It is also a great option for those who want space efficiency, but with minimal overhead and operational changes when compared to a cluster not running any form of cluster-based space efficiency.