VCF Storage (vSAN) Home Page Technical/How-To

More Capacity with VMware vSAN Compression and Global Deduplication in VCF 9.1

Driving down the cost of storage is the motivation behind our ongoing work to improve efficiency in vSAN. When you can store more data on the same physical hardware through software enhancements, everybody wins.

vSAN in VMware Cloud Foundation (VCF) 9.1 delivers two new enhancements that improve storage efficiency: new data compression capabilities and the general availability of vSAN global deduplication. These cluster-based features that make it possible, and they are the focus of this post.

New Data Compression: Always on, Always Efficient

Data compression in vSAN has historically been achieved through the LZ4 compression algorithm. It is commonly used in all types of systems and processes that are latency-sensitive, as it emphasizes performance and low computational overhead over absolute space efficiency. Even with the introduction of vSAN Express Storage Architecture (ESA) in 2022, we continued to use LZ4, but designed ESA to adapt to other algorithms if needed.

For vSAN ESA in VCF 9.1, we employ the use of another compression algorithm. zStandard (ZSTD) is a highly versatile and efficient compression algorithm originally developed by Meta. It has unique capabilities that can offer noticeably higher compression ratios than LZ4 while maintaining good performance and modest CPU overhead. Perhaps most intriguing is that it is adjustable, and has been carefully tuned for vSAN’s storage stack to offer the best balance of space efficiency with minimal resource overhead. This fine-tuning sets the stage for even more innovation to come in future editions of vSAN.

The compression process still occurs high in the vSAN storage stack, evaluating data at the 4KB block level. It will compress data in increments of 512 byte sectors when the data allows. Any gains provided by the new compression capabilities will also be felt throughout the rest of the stack. Higher compression ratios mean less data traversing the network, and less data to process lower in the storage stack.

Figure 1. Transition to a new compression algorithm in vSAN ESA for VCF 9.1

While the new compression algorithm in vSAN for VCF 9.1 will benefit nearly all workload types, the biggest beneficiaries will typically be structured data. Databases like SQL and Oracle use integers, dates, repeatable headers, keys, etc. with patterns that are ideal for this new method of compressing data.

Data compression is now an always-on feature of the cluster. Our comparison with compression used in previous versions of vSAN ESA showed a net improvement even with some data having limited compressibility. The “always-on” approach makes vSAN easier to configure and helps ensure all data will be compressed if the data allows it.

Taking advantage of the feature consists of little more than upgrading to VCF 9.1. After the update to vSAN, the health service will indicate the “disk format” should be updated. This update performs a non-disruptive, light-weight metadata change that completes in a few seconds. When finished, all existing data will remain compressed with the older algorithm, but once the data is read and re-written, it will start using the newer algorithm. This means it may take some time to see the additional capacity savings across existing clusters that were upgraded. New clusters will immediately see the benefit.

vSAN Global Deduplication

Global deduplication in vSAN ESA was announced in vSAN for VCF 9.0 under “limited availability.” This gave a limited set of customers the opportunity to try this new feature and see some of the extraordinary storage efficiency gains courtesy of vSAN’s cluster-wide deduplication.

Global deduplication will be generally available in vSAN for VCF 9.1. This cluster-based feature can be enabled in the UI, and is supported on vSAN HCI clusters from 3-64 hosts, as well as vSAN storage clusters. Unlike the limited release in 9.0, deduplication is supported in clusters using vSAN Data-at-Rest Encryption. While data is encrypted as it is initially written to disk, our deduplication post processing will temporarily decrypt the existing blocks in memory to find and deduplicate existing matches. This means that you can have data fully encrypted without any negative impact on data reduction ratios!

Figure 2. Support of vSAN Data-at-Rest Encryption with global deduplication in VCF 9.1

Deduplication post processing has also been optimized for better performance. It will read data and generate cryptographic hashes much more efficiently using processing lower in the stack, which will use fewer CPU and network resources.

Interpreting the space savings has also improved in vSAN for VCF 9.1. Data reduction ratios for deduplication and compression are conveyed using a common ratio format in an all new UI. More information on this will be available in an upcoming post.

Figure 3. Interpreting space efficiency savings from vSAN global deduplication and compression

Turning off deduplication within a cluster only pauses the deduplication post-processing activity. Data that has been deduplicated will remain that way. Also note that vSAN global deduplication is currently not supported for stretched cluster and 2-Node cluster topologies.

For more information on space efficiency in vSAN for VCF 9.1, see the vSAN Space Efficiency Technologies and vSAN FAQs documents.

Summary

The data compression improvements in vSAN for VCF 9.1 allow you to store more data on the same hardware without noticeable changes in overhead. Paired with the general availability of vSAN Global Deduplication, these enhancements demonstrate our commitment to redefining storage efficiency, and lowering its cost.

@vmpete


Discover more from VMware Cloud Foundation (VCF) Blog

Subscribe to get the latest posts sent to your email.