Since its introduction a little over two years ago, the vSAN Express Storage Architecture (ESA) has proven to be an extraordinarily powerful storage platform for our customers. While it provides almost immediate performance and efficiency improvements for workloads, it allows our Engineering teams to solve difficult problems in new ways. Look no further than to ESA’s remarkable high-performance snapshots used by vSAN Data Protection for an excellent example of this innovation.
For vSAN in VMware Cloud Foundation (VCF) 9.0, this leads us to perhaps one of the most significant advances we’ve introduced to our customers. vSAN ESA in VCF 9.0 (P01) includes global deduplication – a simple and effective way to drive down storage costs in your VCF environment without any significant tradeoff in performance.

Simply put, we believe that global deduplication in vSAN ESA is as good or better than other storage solutions on the market. We’ll explore the details of what deduplication is, how it works, and why it is different from anything provided by the original storage architecture (OSA) in vSAN.
Data Deduplication – What is it and Why it Matters
Data deduplication is a technique used in storage systems to reduce the amount of storage capacity consumed. It does so by finding blocks of data that are identical and replacing all of the duplicate blocks with small metadata pointers to the actual location of the data. The system can reclaim this free space from the duplicates and present it as free capacity. This effectively lowers your cost of storage.
Storage solutions commonly take a variety of approaches in how they implement deduplication. Some designs emphasize higher deduplication rates at the cost of computational resources and its performance impact on workloads, while other designs are built within the physical constraints of a storage solution, such as failure domains, available computational power, and impacts of write amplification.
Our team looked at a variety of ways to implement deduplication. The primary areas of emphasis in our approach included:
- Balance effectiveness with resource utilization. Achieving a balance between the highest levels of deduplication with a minimal amount of effort to perform.
- Maintain Performance. Ensure that there is little to no impact on guest VM workloads.
- Scalability. Ensure the system maintains performance and scalability as the cluster grows. This is especially important given that vSAN’s distributed architecture allows you to increase the capacity by adding more hosts.
Let’s look at how global deduplication in vSAN ESA achieves this.
How vSAN Global Deduplication is Different
The Express Storage Architecture offered our Engineering team a clean slate in the design of deduplication. It eliminated some of the technical constraints that existed with the OSA and the ability to provide a deduplication engine that was highly efficient and flexible. Lets look at some of the most noticeable differences in how deduplication was designed in OSA, and its implementation in ESA.
Deduplication Domain
Deduplication in vSAN OSA had a deduplication domain that was limited to a discrete disk group in a host. This reduced the effectiveness of deduplication, as identical blocks could not be deduplicated if they resided on different disk groups. This hampered the ability of vSAN OSA to deliver higher deduplication ratios.
Deduplication in vSAN ESA uses the entire cluster as its deduplication domain, where any identical block across the entire cluster can be deduplicated. Paired with our ability to deduplicate at a small 4KB level of granularity, this will dramatically improve the probability of finding duplicate blocks and the ability to deduplicate them, improving data reduction ratios. The use of deduplication in ESA does not change the boundary of failure for storage devices. A discrete failure of a storage device will only impact that device.
Deduplication Processing and Performance
vSAN OSA performed deduplication inline, as the data is being destaged to the capacity tier. While it occurred after the write acknowledgement was sent to the guest, the deduplication process would slow the destaging of the data to the capacity tier, which made the buffer more prone to filling. In environments with slower storage devices, this could impact VM latency due to internal vSAN congestion handling in the OSA.
vSAN ESA will deduplicate data after it has been persisted to disk and cools down. Deduplication does not occur in the hot data write path, ensuring that deduplication processes are not wasted on recently written data that is often deleted or overwritten shortly thereafter. Our intelligent post processing performs these tasks in a smart way, dynamically determining when to deduplicate data, using only free CPU cycles to do so. This minimizes interference with guest VM activity and maintains low latency. Our metadata mapping also identifies and prioritizes the deduplication of colder data before hotter data, which helps maintain processing efficiency.
How it Works
When deduplication is enabled in a cluster, two or more specialized objects will be created on the cluster that are an integral part of the deduplication process.
- Dedup metadata object. This object maintains a hash entry for every 4KB block stored. It is the hash entry that helps identify other instances that contain the same data.
- Dedup data object. This object will store the 4KB blocks that are deduplicated. Dedicated objects are used to store deduplicated data to prevent hot spots of I/O within a given VM.
Deduplication in ESA is a post processing activity. Our data write path for ESA remains the same as with clusters that do not use ESA’s deduplication. As data resides on a vSAN datastore, if the cluster is relatively quiet, vSAN will look for opportunities to deduplicate the data using unused CPU cycles. It will proceed in the following manner:
- vSAN will read a discrete 4KB block and generate a secure cryptographic hash to be stored in the dedup metadata object.
- vSAN will look for a matching hash entry in the dedup metadata.
- If a match has been found with data in the dedup data object, it will update the block with a metadata pointer and reclaim the space.
- If a match has been found with no data in the dedup data object, it will move both the current data and the original data (using the back-pointer discussed below) to the dedup object, update the blocks with a metadata pointer and reclaim the space.
- If no match has been found, it will leave the data as-is. Hash entries will be created in the metadata object with a back-pointer where the data resides so that if and when a duplicate entry is identified, it can be deduplicated as described above.

Figure. Deduplication process in vSAN ESA for VCF 9.0.
The objects responsible for holding the data and metadata are self-managed and largely transparent to the administrator. The scaling of our deduplication engine is also intelligent and automatic. For example, as the cluster host count is increased, so do the available processing resources for deduplication. This deduplicated data is sharded in an innovative way to distribute deduplicated data across hosts in the cluster. This delivers tremendous scalability and performance but also minimizes resource utilization. The sharded data can preserve spatial locality of contiguous data, reducing fragmentation and improving the chances that a large read request can be satisfied from the same shard.
How Effective is it?
This cluster-wide deduplication in vSAN ESA will yield dramatically higher data reduction ratios. How high? This will depend on several factors, including the likeness of data, the characteristics of data and workloads, and the overall amount of data within a cluster. Our internal testing has shown our global deduplication in vSAN ESA is on par or better than deduplication found in many of the popular storage offerings.
Initially, vSAN will render the capacity savings as a data reduction ratio that includes savings from data compression, as well as global deduplication. We express this savings in the form of a multiplier of reduction. For example, a 2x data reduction means that consumption is just 50% of the original size, whereas 4x means that consumption is just 25% of the original size. It is worth noting that when comparing storage solutions, “data reduction” is not a standardized term. Some may calculate their data reduction differently, or perhaps include other aspects of data reduction such as thin provisioning.
The Advantage of Scalable Deduplication
Pairing cluster-wide deduplication to vSAN’s distributed architecture leads to an interesting benefit for you, and a distinct advantage over traditional storage arrays. Typically, a deduplication domain for a modern storage array is the entire array. As an environment grows, and an additional array is added, the deduplication domain is split into two. This reduces the number of opportunities that identical data can be deduplicated, limiting its effectiveness.
Deduplication in vSAN ESA is cluster-wide, and is not subject to this constraint. If an 8-host cluster grows to 16 hosts, storage capacity doubles, but it remains a single deduplication domain. A larger amount of data may have many more opportunities to deduplicate than a smaller amount of data, increasing its effectiveness. Even though our deduplication domain can grow, vSAN’s boundary of failure does not. See the “vSAN Availability Technologies” paper for more information on how vSAN’s distributed object model decouples availability considerations from cluster size.

Figure. Comparing the limitations of traditional storage with vSAN’s scalable deduplication
Beyond Capacity Savings
Global deduplication in vSAN ESA allows you to do more with what you already have. Lowering storage costs is an obvious benefit, but there are other advantages.
- Simpler designs. With the increased effective storage density, some aspects of cluster design can be easier. For example, perhaps the capacity requirements of a vSAN storage cluster determined a minimum of 20 hosts, but those 20 hosts would not be able to fit within a single rack, which complicates the network design. After factoring in deduplication, the cluster could have fewer hosts that could easily fit within a single rack, and simply the network requirements.
- Better cost optimization. Remember that complimentary 1 TiB per core of vSAN capacity that you are entitled to with your VCF license? That is for raw capacity. Effective capacity with deduplication can be much greater, which means that your vSAN capacity entitlement in VCF becomes much more valuable with deduplication.
- Improved standardization. The immediate cost savings also translates to the hypervisor managing compute and storage, whether it is in an aggregated vSAN HCI deployment, or disaggregated using vSAN storage clusters. Deployments become consistent, and the budgets that are disproportionally consumed by storage fabric costs will free up.
Availability
The information shared in this post is intended to help customers determine their level of interest in the feature when it is released in VCF 9.0 patch 01 (P01). It is initially under limited release, available through Broadcom’s Technical Qualification Request (TQR) program. If you are interested in trying it out with the P01 release of VCF 9.0, you can contact Broadcom for more details using this form. We will be focusing on customers who would like to enable it on single site vSAN HCI or vSAN storage clusters ranging from 3-16 hosts using 25GbE or faster networking. Initially, some topologies like stretched clusters and some data services like data-at-rest encryption will not be supported while using this feature.
Summary
vSAN ESA gives our Product and Engineering teams the ability to deliver capabilities that were simply not possible in the past. We’ve delivered new functionality, and with vSAN global deduplication, it is the customer who wins.
***
Ready to get hands-on with VMware Cloud Foundation 9.0? Dive into the newest features in a live environment with Hands-on Labs that cover platform fundamentals, automation workflows, operational best practices, and the latest vSphere functionality for VCF 9.0.